Syncsort says the results from a recent survey dovetail with what they are seeing from their own customers -- that customers are increasingly moving to Hadoop, and that while much of this to date has been for operational use cases, they expect to see more transformative uses that drive new business opportunities.
Hadoop deployments for Big Data are increasingly moving from testing and experimentation to production, and 2016 augurs well for further development of use cases. That’s the findings on a new survey conducted by ETL software maker Syncsort.
Syncsort, which began decades ago making ETL [extract, transform and load] software for mainframes to reduce data warehousing costs, has successfully adapted its technology to Hadoop and Big Data. Their customer base is concentrated in the Fortune 100 to Fortune 500 space. They conducted this survey to determine Hadoop trends, and got responses from over 250 data architects, IT managers, developers, business intelligence and data analysts and data scientists, with two-thirds coming from organizations with revenues over $100 million, and from a broad range of industries.
“At the last Hadoop Summit we were talking about when we ‘cross the chasm’,” said Tendü Yoğurtçu, General Manager of Syncsort’s Big Data business, referring to Geoffrey Moore’s thesis about the process of broadening the technology adoption lifecycle. “We believe we have crossed that chasm, with it now being accepted as the next-generation data warehouse platform.
The survey found that respondents believe offloading from expensive platforms into Hadoop will continue to increase in numbers and scope. 63 per cent said Hadoop will help them increase business or IT agility. 55 per cent think it will increase operational efficiency and reduce costs. 51 per cent want to leverage it to make more data available for business use across their entire organization.
“Hadoop is no longer a novelty, and pretty much all the Fortune 100 companies use it,” Yoğurtçu said. “2015 was the first year that enterprise companies had budget for this. In 2016, typical enterprises are growing this budget because they saw a proven return on their investment for the first projects.” She said that Syncort’s own customer experiences pretty much confirm these findings of momentum.
“About 50 per cent of our customers had operational efficiencies in moving from mainframe to Hadoop and those worked very well, and they were able to show efficiencies,” Yoğurtçu said. “In areas like financial services, agility is important because adding a new data source was taking up to six months and they want to do it in weeks. Those efficiencies are achieved in the first couple of projects.”
Yoğurtçu acknowledged that the use cases of Hadoop have been limited to date, but she sees evidence from the survey that is changing.
“The surprising thing in the survey from the present use cases was that over 60 per cent will use Hadoop for operational use cases, which are more the low-hanging fruit,” she said. “Enterprise organizations are trying to create Hadoop-centric warehouses, and bringing all the enterprise data in is an operational efficiency. Being able to access it on the mainframe and leverage Hadoop is another operational use case. However, we expected more transformative use cases that open up new business opportunities, such as where companies are trying to capture customer behavior in real time with it.” The survey also indicated Hadoop has yet to be leveraged for mobile apps and software, with only 4.9 per cent indicating this was being done.
On the other hand, the survey data indicate respondents are thinking about transformative use of Hadoop. More than half said they see Hadoop as a way to innovate, using data from social media and IoT, and applying predictive analytics and visualization for greater insights about their business.
“Manufacturing, especially the Internet of Things, and healthcare, are likely growth areas this year,” Yoğurtçu said. “Mobile will also become more critical. The telcos and gaming companies are taking advantage of Hadoop in gaming, which is a huge growth area. Modern multiplayer games have a large amount of user data. When a player is making choices, they have to respond very fast, and that grows with the number of players and with mobile players in particular.”
This leveraging of Hadoop for streaming, real-time data sources is also likely to find use in fraud detection, analytics on telemetry and security data, and insurance claim validation.
Security and data governance are also likely to become major areas of focus as organizations move to production deployments. Syncsort believes that more organizations will move towards adopting a “Hadoop first” approach to data management, in which they bypass traditional platforms entirely and apply metadata, lineage, security, and other data management measures on Hadoop from the start.
A related development likely to spur these trends is increasing use of the Apache open source cluster computing framework, which Syncsort believes is ready to move from a talking point into development.
“To date, MapReduce has been used for production, and Spark much more for development, as users determine if Spark is powerful enough to use as the new compute framework,” Yoğurtçu said. “That’s what we are seeing from our customer base, and the survey validates that.”
However, the survey found nearly 70 per cent of respondents are most interested in Apache Spark, which surpasses interest in all other compute frameworks, including MapReduce, which came in at 55 per cent. While Syncsort expects MapReduce will still be the prevalent compute framework in production, the high level of interest should translate into more Spark deployments, mostly running on Hadoop.
“The obstacles to Spark adoption have been maturity and skillsets,” Yoğurtçu said. “The MapReduce community has matured and worked hard on data governance, security, and data management principles. Spark is early stage. No security or encryption has been developed for it yet, and it suffers from the general lack of confidence around early stage open source projects about which ones will remain, and which will go away. Spark also needs to be developed in a new programming language, Scala, which has fewer developers.”
Despite these handicaps, the suitability of Spark for additional use cases is driving interest.
“Because of the use cases, interest is very high,” Yoğurtçu said. “Spark has the promise to accommodate both batch and streaming workloads, and having a single framework for multiple workloads is attractive. Ultimately, the use cases are driving everything.”