New Pentaho-Hitachi Content Platform integration strengthens, simplifies Pentaho use of unstructured data

The new Pentaho 8.2 release also makes hybrid cloud data management easier, and adds new and updated ecosystem support, including their first with the Jupyter Notebook open-source web application.

Geoff Marsh, VP and Analytics Leader, Americas, at Hitachi Vantara

The big theme in Hitachi Vantara’s new Pentaho 8.2 release is multiple new integrations with the Hitachi Content Platform [HCP] object storage. It significantly upgrades Pentaho’s ability to provide analytics insights from unstructured data. It also makes the integration much simpler compared to what customers had to do before, and will cut the costs of analytics in Pentaho by allowing customers to offload data to the object store, while still using it for analytics. The new release also makes hybrid cloud data management easier, and adds both new integrations with the third-party ecosystem.

The kind of integration of capabilities realized here was one of the purposes behind the creation of Hitachi Vantara from Hitachi Data Services, from where HCP came, Pentaho, and Hitachi Insight Group.

“Not everyone was always sold on how tightly integrated the Pentaho analytics could be integrated with HCP,” said Geoff Marsh, VP and Analytics Leader, Americas, at Hitachi Vantara. “HCP was seen by some as an archive – but its NOT just an archive and I’ve been a massive proponent of that view. Data is a living, breathing thing, and object storage plays a big part of that. We have customers who use it for a cold but still active storage.”

Unstructured data is vastly underused in making business decisions. According to Harvard Business Review, while under half of an organization’s structured data is used in making business decisions, under one per cent of unstructured data – like text, video, audio, images, social media, clickstreams and log files – is used in any way at all. This is something the new integration has the potential to change massively. Pentaho had the ability to run analytics on unstructured data before, but the integration with HCP makes that ability more powerful, and makes it much, much simpler for the customer.

“This release strengthens the unstructured data component, but we also wanted to make it as easy as possible to do this bring HCP and Pentaho together natively, rather than force customers to write scripts or use middleware,” Marsh said. “The capability existed before by using things like Hitachi Content Intelligence, the search capability for the platform. That let the customer use the platform as a data source for Pentaho – but it was too complex. Now with the new integration, Pentaho integrates with HCP directly. HCP is now a data source on a drop-down menu within Pentaho.”

Because bringing HCP and Pentaho together was very complex, most customers didn’t bother.

“We have many customers who have both of the products together, but not a lot of them were USING the products together,” March noted. “This includes a lot of the big banks. Then we went in and showed them what they could do with this. Now instead of just throwing everything into a data lake, they could use Pentaho to cleanse and normalize data within HCP and keep the integrity of the data intact.”

Marsh said they expect that customers will use the integration to open up new use cases and to cut costs.

“Banks for example will be able to offload data from Hadoop offload to HCP, where they can still et at the data, and run analytics where they want, but where it’s much cheaper than keeping it in an active data store like Hadoop, Marsh said. “In discussions, Ive had that’s what customers are really excited about.”

New use cases include being able to correlate structured trading transaction data with email in financial services, for better documentation and compliance verification, blending structured patient data and medication history in healthcare with unstructured MRI scans, and combining unstructured in-store video footage with point-of-sale data in retail to better analyze things like traffic flow around specific brand, and individual customers’ shopping practices.

Marsh said that the objective in hybrid cloud data management was to make it all easier for customers.

“One of our reference customers, CARFAX, runs three clouds but orchestrates it all on-prem,’ he indicated. “Pentaho can determine which normalized data is most appropriate for each cloud target. We needed to make it all as easy as possible. This strengthens the capability to move that data anywhere in any cloud at any time.”

Pentaho 8.2 also announced new third-party integrations, including the AMQP messaging protocol for IoT use cases that stream data from edge devices to the cloud, and Python Step, to operationalize machine learning and deep learning models built with Python. Pentaho customers can now also switch from OpenJDK, which now comes with commercial terms, to a free and open source version of OpenJDK. Google Cloud security has been enhanced with support for customer managed encryption keys.

“We also now have a new and full integration with Jupyter Notebook [open-source web application that is very cool,” Marsh said. “Developers will like that.”

Finally, Marsh noted that channel opportunities around Pentaho have changed significantly since it became part of Hitachi Vantara.

“We have really changed our mentality on that, and are now definitely partner-first,” he emphasized. “Partners are critical in analytics because analytics are not sold or consumed as a product. They need to be able to either drive more revenue, or reduce the cost of doing business. When we talk to customers it’s about applications like churn analytics fraud analytics, and we need partners to be able to show the value to the business. We have placed much more emphasis on partner relationships as a result, whether they are Global SIs, regional SIs, or consultants.”