A reference architecture co-designed with vendor partners Cloudera and Syncsort simplifies Big Data and analytics processes for new Hadoop users, and chops the expense of data transformation jobs.
Dell has announced a new reference architecture for Hadoop, co-designed with Cloudera and Syncsort, that simplifies the process of transforming data into a ready state for analysis. It’s also an offering they see as extremely well-suited for channel partners.
In June 2014, as part of a blockbuster announcement of four new appliances made in conjunction with software vendor partners, Dell announced a series of In-Memory appliances for Cloudera Enterprise aimed at accelerating Hadoop deployments. This new reference architecture builds on that by providing an option aimed at new Hadoop users who don’t want to learn some of the specialized skills it requires.
“Some customers have told us they don’t want to spend the time learning skills like Java and MapReduce needed for Hadoop,” said Armando Acosta, Dell’s Hadoop product and planning manager. “We tried to design this making all of this simple, so that we don’t cause a lot of churn to the customer environment.”
Another big value-add this solution provides is controlling data warehousing costs by reducing those connected with data transformation, which through the ETL [extract, transform and load] process makes the data usable for analysts.
“Data transformation jobs bring together different pieces of data from different siloes so analysts just have to query one table,” Acosta said. “They are fundamental and there’s no way of getting around them. You need to do them to analyze the data. But these jobs are getting larger and larger and consume a large amount of resources in the enterprise data center.” He indicated that in 2014 Gartner said that they consume 70 per cent of resources in enterprise data warehousing.
The solution, eloquently named the Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture, directly addresses this issue.
“It enables you to bolt on the Syncort ETL software for this use case,” Acosta said. “Syncsort’s DMX-h technology lets you take a SQL script and translate it into a MapReduce script that is native to Hadoop. That simplifies things, but it also has the indirect benefit of offloading the performance and capacity hit so your warehousing is just doing reporting, query and analysis, not data transformation. These data transformation jobs are up to 10,000 lines of SQL script.”
In Dell’s internal testing, an entry-level technician created ETL jobs with the Dell | Cloudera | Syncsort solution 60 per cent faster than an expert level senior engineer running the same scenario with do-it-yourself, open-source ETL solutions. Dell estimates this saves customers 76 per cent of administrative costs.
While Cloudera is a long-time partner, this is the first time Dell has worked with Syncsort.
“Their value here enables us to do that end to end solution, through their capability on the ETL side,” Acosta said. “They started out in mainframes, and moved to Hadoop seeing the value of their technology there.”
Acosta said that Dell and Cloudera had also contributed new value to this offering.
“Dell helps with the design architecture and test configuration to ease the hard work, and Cloudera has done the work to ensure full integration with Cloudera,” he said. “It’s so hard to build a solution with multiple vendors that works well. This is all tested, validated and optimized in our labs.”
Customer demand for this is broad.
“We are seeing interest from a variety of customers across many verticals, including manufacturing, retail and pharmaceuticals,” Acosta said. “Customers are trying to control costs around enterprise data warehousing and ETL data transformation does this. We are also seeing interest from different sizes of companies across the board, who have data in the TBs, who are also doing large transformation jobs, and who don’t want to write that million dollar check for them year in and year out.”
Acosta indicated that Dell sees this architecture as a strong partner play.
“Before to build a solution like this from different vendors, partners had to take server, network and software catalogues and connect all those dots,” he said. “We take all that pain away with a template to build it for their customer, a 53 page-long document that tells the partner how to build it from scratch. They can then deploy it and wrap their own services, including consulting around what services to move to Hadoop, around it.”
The Dell | Cloudera | Syncsort Hadoop ETL solution is available now.