By Derrick Harris. This article originally appeared on Gigaom, October 30, 2014.
I’m probably guilty of spending too much time talking to startups and web companies, to the point where it’s easy to forget that not everyone is regularly re-architecting their applications and inserting new tools into their data-processing pipelines. Not every company is Facebook, where storage is measured in petabytes,new task-specific data-analysis tools pop up all the timeand there are teams of data scientists analyzing every conceivable facet of user behavior.
Case in point: Lockheed Martin. Yes, the large government contractor is toeing the cutting edge with its involvement in areas such as quantum computing, but it’s also dealing with some very slow-moving government agencies. According Ravi Hubbly, a senior technology manager at Lockheed, his team is in the process of deploying Hadoop within its clients’ infrastructures less an an engine of innovation and more as a bandage to stop the bleeding from big data.
It’s the classic big data problem. Thanks to the myriad contact points agencies now have with their users, there’s just too much data, in too many new formats, coming way too fast. Hadoop in this case isn’t a replacement for existing systems or a support for a massive new enterprise-wide data warehouse underneath some new query tools. Rather, it’s like a data middleman that takes what other systems can’t handle and stores it, maybe processes it into a different format and then sends it back.
Programs such as Social Security and food stamps still run on mainframes and COBOL, others implemented enterprise data warehouses in the 1990s that are reaching their scalability limits, and none of it is going anywhere. In some cases, particularly for programs and applications that can’t go offline, the process is like “changing the engine of the train while the the train is still running,” Hubbly said.
How Lockheed might integrate Hadoop into a mainframe environments, from a presentation Hubbly gave in 2013.
If there’s a glimmer of hope for Hubbly’s team at Lockheed, it might be coming via data-preparation software from a startup called Trifacta. That company, like a handful of other startups including Paxata andTamr, is using machine learning and a relatively streamlined user experience to simplify the process of transforming data from its raw form into something that analytic software or applications can actually use. They’re in the same vain as legacy data-integration or ETL tools used to do, only a lot easier and designed with big data stores like Hadoop in mind.
Initially, Lockheed started using Trifacta to help its staff simplify the process of transforming data as it moves from place to place, while also ensuring that applications and end-users are getting the same results they would have received in the first place. But now clients are getting interested in what else they can do, Hubbly said. That’s in part because analysts can transform the data themselves without always having to rely on Lockheed or internal IT departments.
Some of the big data use cases often cited as low-hanging fruit are now coming into play. Agency analysts want to merge health care data and ecology data to see if there are important trends, and then maybe bring in a few other data sources, join it and start building predictive models. They’re now collaborating online and in real time rather than sharing results of their models via emails or reports, and things that used to take months are happening in weeks, Hubbly said.
“Data analytics has always been critical to Lockheed’s business, it’s just the traditional versus more modernized techniques,” he explained.
Now that agencies are getting a taste of what’s possible, they’re asking why their applications can’t work like those at Google or Facebook, and why they can’t predict important outcomes like those companies can predict what ads users want to see. It’s technically possible, Hubbly said, but conservative agencies and companies will first have to embrace the same technologies, concepts and, perhaps most importantly, mindsets.
“With growth of data, the amount of data scientists is now going to grow,” he noted, but they’ll be hamstrung without the right technologies and protocols in place. “… The technology is changing [and] the the organization structures within big companies … need to be agile enough to adapt.”