Duality Technologies enables secure GenAI workflows on NVIDIA GPUs

Dr. Alon Kaufman, CEO and Co-Founder of Duality Technologies

Duality Technologies, which empowers organizations to unlock the value of their sensitive data through secure data collaboration, has announced support for Google Cloud’s Confidential Computing portfolio, including NVIDIA GPU-powered confidential virtual machines on Google Cloud, enabling large-scale secured AI workloads such as LLM training and inference. Duality empowers organizations to securely collaborate on sensitive data with their business ecosystem: customers, suppliers, and partners. By operationalizing Privacy Enhancing Technologies (PETs), Duality enables secure analysis and AI on encrypted data – while complying with data privacy regulations and protecting valuable IP. This is why Privacy-Enhancing Technologies (PETs) are shifting from a niche technology to the essential enabling layer for global data utilization.

“Microsoft ’s latest expansion of Sovereign Cloud services, confirms digital sovereignty is the new security baseline,” said Dr. Alon Kaufman, CEO and Co-Founder of Duality Technologies. “But true “Sovereign AI” demands more than geographic placement. It requires control over the computation itself. However, sovereignty can’t lead to isolation! The real challenge isn’t just where the data resides, but how to enable mission-critical Secure Collaborative AI across these expanding sovereign and cloud boundaries – without ever moving or exposing the sensitive data. This is why Privacy-Enhancing Technologies are shifting from a niche technology to the essential enabling layer for global data utilization.

With this launch, the Duality Platform now supports GPU-backed LLM inference and encrypted Retrieval-Augmented Generation (RAG) within trusted execution environments (TEEs) – a significant performance leap from previous CPU-only support.

This means that customers can run a secure generative AI workflow on NVIDIA GPUs with Google Cloud Confidential Computing, now featuring end-to-end protection against data leakage powered by Duality. They can also combine full-stack data confidentiality with NVIDIA H100 GPU performance, unlocking confidential AI use cases that were previously impractical due to latency and throughput constraints.

“This changes the game,” Kaufman said. “Our customers can now run privacy-preserving AI with LLMs at production scale. With GPU acceleration, the performance bottlenecks of secure computing are gone-making secure LLM training and inference practical.”

The new capability is built on Google Cloud’s Confidential Space and Confidential NVIDIA H100-powered confidential VMs, with support for Intel TDX and Cloud KMS integration. Duality has successfully validated the platform running a Mistral-7B model using encrypted vector RAG (via Faiss) in a fully confidential pipeline.

“With Confidential GPUs, organizations can process sensitive AI workloads entirely within trusted execution environments without giving up performance,” said Nelly Porter, Director of Product Management, Google Cloud. “Pairing NVIDIA H100-powered confidential VMs with Duality’s encrypted workflows allows LLM training and inference to happen at scale, with end-to-end protection from data leakage.”

Confidential computing protects data while it’s processed but it doesn’t automatically enable collaboration between parties or AI across silos. That next layer of value is what Duality delivers: privacy-preserving analytics and AI that runs on top of any confidential infrastructure. Organizations can run encrypted analytics or AI models jointly without moving, decrypting, or trusting counterparties. Where Gartner calls confidential computing an “architect-level foundation,” Duality turns that foundation into a collaboration platform, a way for banks, hospitals, governments, and AI vendors to use their secured data together.

Key highlights of the news include GPU support for Confidential AI, and the ability to run secure LLMs and encrypted RAG on Confidential NVIDIA H100s. The scalable performance is orders-of-magnitude faster runtimes vs CPU-only workloads, Because it is enterprise-ready, it meets the needs of regulated industries, defense, healthcare, and AI-native companies. It is now available via Dynamic Workload Scheduler in Google Cloud’s Confidential Space.

Until now, confidential AI was limited to CPU-only environments – suitable for basic testing, but insufficient for the demands of large-scale AI. With the arrival of Confidential GPU as part of the confidential computing portfolio, Duality customers can now run both LLM training and inference securely inside Trusted Execution Environments. This breakthrough enables high-throughput, privacy-preserving AI workloads that were previously impossible to execute – unlocking new possibilities across industries and use cases.

This capability is initially available on the Google Cloud Confidential A3 virtual machine type in preview, with broader rollout expected later this year.