The VAST DataStore, VAST DataBase and DataBase Engine combine to deliver on VAST’s long term plan of utilizing their storage technology for deep learning as well.
VAST Data has had a significant impact on the storage industry with their concept of unified storage. Now they have announced that, in close collaboration with NVIDIA, they are delivering on what was always a core vision by adapting their storage technology into an AI-focused data platform company that overcomes initial limitations on deep learning and becomes what the company terms a thinking machine.
“We are executing to a plan that has been in place since Day One – a software- based data computer designed for very sophisticated forms of deep learning,” said Jeff Denworth, co-founder and CMO at VAST Data. “We have been working with NVIDIA from Day One, and are one of four companies being sold for the NVIDIA SuperPODs. We are also the only enterprise storage platform that works with NVIDIA on a SuperPOD. We were asked 10 days after OpenAI was started if we could build a thinking machine.
The answer at that point was no.
“Two problems existed then,” Denworth stated. “First, the market as a whole was not ready. Second, we as a company were not ready. We had 300 engineers in the company then, seven years ago, so we started at the data layer. But our architecture was ultimately intended to merge file and object structures together – and to be a database.”
To do this, VAST Data rethought how deep learning databases worked, with what they term their VAST Disaggregated Shared-Everything [DASE] architecture.
”You don’t work in a batch concept,” Denworth said. “Every data event triggers a computer event. Yet, forever you have had the idea of having event OR data driven systems, where you put the data in a data lake and process it. With the DASE architecture under the Data Store, we have broken down the limits. Before, no one could transact to these systems very fast. Hadoop, for example, was never meant to be transactional. We saw an opportunity to do something different and new, by building data structures to allow hundreds of cores access without corrupting data. Then you could really scale.”
The first building block in this was the VAST DataStore, the foundation of their platform which, like their core storage, eliminates storage tiering completely. It also significantly reduces price, as resolving the cost of flash storage has been critical to laying the foundation for deep learning for enterprise customers,
“It starts with the VAST DataStore to make all types of data eligible for use,” Denworth stated. “This allows DataStore to meet the needs of today’s extreme AI computing architectures like the SuperPOD systems, and world-leading big data and HPC platforms. Its system efficiency also brings archive economics to flash infrastructure. To date, VAST has shipped more than 10 exabytes of data globally.
The second component is the VAST DataBase, a semantic database layer that has been added natively into the system to add structure to unstructured natural data.
“The VAST DataStore is just an unstructured database, while the VAST DataBase itself gives it a new structure and style,” Denworth noted. “This resolves the tradeoffs between transactions, in capturing and cataloguing natural data in real time, and analytics to analyze and correlate data in real-time, that overcomes the limits of transactional databases and now makes it possible to unify data warehouses with database systems.
“The transaction scale is enormous, with the query performance being 100x faster than hard drive-based data lakes because you no longer have to move through separate systems,” Denworth continued. “You have massive amounts of consolidation with just one namespace across all the data centres.”
This brings us to the third component, the need for an engine that can refine and process huge amounts of both unstructured and structured data. The DataBase engine is a scalable and ACID-transactional distributed system designed for rapid data capture while also featuring an exabyte-scale columnar data structure optimized for flash that enables deep and fast queries at any scale. The VAST DataEngine is a global function execution engine that adds application triggers and Python-based functions natively into the VAST DataPlatform.
“A lot of NVIDIA technology is woven through this system,” Denworth said. “In our cluster today, it is managed by an NVIDIA SmartNIC.”
Denworth said that while most analysts and many customers see this process as being in its opening rounds, and that there is a long way to go, the new VAST system is complete today, and is designed for customers who are ready to use it today.
“I think there is clearly a maturity curve in the market,” he stated. “We are self serving in terms of the customers we go after. We spend our time where the action is. As a result, our business has never been more robust.
“People moving into AI typically have one path,” Denworth added. “That is to assemble and build computers at a large scale of computers. Companies like Snowflake and Databricks have consolidated parts you would have to put things together for a Big Data market. They are on the right path. But they built their systems around business reporting and BI tools, rather than this era of deep learning. There is a big market here for those who can democratize it.”