Deepgram launches Flux Conversational Speech Recognition Model for real-time voice agents

Scott Stephenson, CEO and Co-Founder of Deepgram

At the VapiCon 2025 event, Voice AI platform Deepgram has introduced Flux, the world’s first conversational speech recognition (CSR) model designed specifically for real-time voice agents. Unlike traditional automatic speech recognition (ASR), which was built for transcription use cases like captions or meeting notes, Flux is different because it has been trained to understand the nuances of dialogue. So it doesn’t just capture what was said. It knows when a speaker has finished, when to respond, and how to keep the flow of conversation natural and engaging. It also effectively deals with the biggest problem in Voice AI Agents – interruptions.

Deepgram started off in transcription, but now builds all sorts of different voice models for every layer of the stack.

“The new model that Deepgram is releasing is called Flux,” said Scott Stephenson, CEO and Co-Founder of Deepgram. “Flux is a new type of streaming. It’s a streaming-first model built totally from the ground up. It’s something that we’ve been working on for over two years. We knew it was coming, that voice agents needed this. But we wanted to do it right. We wanted to do it with super low latency, extremely fast, lots of throughput – but still have really high accuracy. There’s no accuracy trade off whatsoever in Flux compared to our last models like Nova3 or Nova2, but the latency is drastically reduced.

“Flux redefines what speech recognition can do for real-time AI,” Stephenson added. “For decades, ASR was built to listen and record. Flux is different. It listens, understands, and guides conversations with human-like timing. It’s the foundation voice agents have been waiting for and is our latest milestone towards solving the Audio Turing Test.”

The global voice AI agents market is projected to reach nearly $47.5 billion by 2034, growing at a compound annual rate of about 34.8%. This impressive growth is primarily due to the enterprise shift toward automated customer self-service, smarter agent assist tools, and embedded conversational experiences across industries. But traditional systems weren’t designed to participate in live dialogue. To recreate conversational flow, developers have been forced to piece together transcription, voice activity detection, and turn-taking logic – a patchwork that leads to latency, errors, and frustrating user experiences. Flux eliminates these problems by embedding turn-taking directly into recognition. It transforms speech recognition from simply transcribing words to modeling the flow of dialogue itself. This provides developers with the tools to build responsive, human-like voice agents without the complexity of workaround code or endless threshold tuning.

“Flux is the first step in the way we think of the world, which is our Neuroplex architecture,” Stephenson stated. “Flux is the first piece to it, which is providing more context to the entire conversation.” He indicated  that they wrote the white paper Neuroplex about six months ago as something that describes a system that is a lot like a human brain, where you have specialized regions.

“The only way you can pass context is through text streams,” Stephenson said. “Flux is the first model we have released that is in that architecture. It’s a really good model for understanding voice AI.”

Flux provides embedded turn-taking intelligence, conversation-aware recognition that handles timing inside the model itself, with context-aware turn detection and native barge-in handling for fluid exchanges. It also promises lightning-fast performance, and ultra-low latency where it matters most with ~260ms end-of-turn detection, plus distinct events to support eager response generation before a turn is complete. Development is simpler and faster, with turn-complete transcripts and structured conversational cues replacing fragile client-side logic, so teams can ship production-ready agents in weeks, not months. Finally, Flux also provides enterprise-ready scalability, including Nova-3 level accuracy, and GPU-efficient concurrency with 100+ streams per GPU, and predictable costs that avoid the hidden overhead of bolted-on systems.

“With Nova 3 and our previous models, the Deepgram is actually under the hood that has been formed and jammed to work in a real time way,” Stephenson indicated. But when you use it versus Flux, which is a from-the-ground-up rewrite bi-directional streaming model, that’s a totally different kind of architecture because you can feel the drastic difference. It’s because Flux is a real-time paradigm. You don’t have to set up another VAD or some other kind of end of thought detector.”

“At Vapi, our mission has always been to give engineering teams a platform to build their conversational front-door,” said Jordan Dearsley, Founder and CEO at Vapi. “Deepgram’s launch of Flux is a perfect example of that vision coming to life. By embedding turn-taking directly into recognition, Flux solves one of the hardest challenges in conversational AI. We’re thrilled Deepgram chose VapiCon to introduce this breakthrough, and we can’t wait to see the incredible voice agents developers create with it.”

Stephenson emphasized that while people associate Deepgram with transcription, it is actually an LLM company.

“We are training our own conversationalist LLMs at Deepgram,” he said. “We are not a transcription company. We are not trying to build a know-it-all LLM. We are trying to build a conversationalist LLM but its going to be bi-directional GTB multitrack. It’s going to be working in this voice agent way.”

So how will Deepgram work with LLMs?

“The LLMs will have to be rewritten, Stephenson indicated. “There are no bi-directional streaming LLMs now. Those have to be invented. It’s going to be a mess but AWS started with 13 options. Then, it’s going to congeal back together again. The next step is you build that into the individual components, the perception, the understanding part and then the generation part. Flux is the first entrant in that and Flux will be expanded as well to have even more multi-track capabilities.”

Stephenson explained that in the short term, everyone has to get comfortable with the fact that its going to be a hybrid system, no matter what.

“I think that most AI will be real-time in the future, but I’m talking 10 years from now,” he said. “There’s going to be a big transformation to make real time work really well, with the low latency, high accuracy of all this. Real time needs to always work. It’s a totally different frame of reference, But the demands on quality are just going to go up and up. There’s so much more that needs to be built from an infrastructure perspective, from a testing perspective. Customers are looking for infrastructure providers that are more like strategic infrastructure providers – because then they don’t have to think about it. At Deepgram, we are always thinking about real-time first, even though you can’t always build everything in real time. It’s not quite ready yet.”

“At Lindy, our mission is to build the world’s most capable AI employees, and voice is a big part of this,” said Flo Crivello, Founder and CEO of Lindy. “Deepgram has been our partner of choice since the earliest days, and Flux brings things to the next level: there is simply nothing coming close on the market in terms of latency or conversation awareness. It’s enabled us to deliver the smoothest, most natural, interruption-free conversations for our customers.”

Who will use this today? Voice AI builders, such as developers, engineering leads, and AI teams creating real-time agents. Enterprise innovators – leaders modernizing the  customer experience with agent assist and conversational AI platforms, and ecosystem partners, who are platform providers, consultancies, and cloud architects looking to integrate CSR into larger AI stacks.

“We’re going to do something super special in October for the release,” Stephenson concluded. Thus, to celebrate the launch, Deepgram is announcing OktoberFLUX – making Flux FREE to use for the entire month of October. Developers can use Flux to build and test real-time voice agents at no cost, with support for up to 50 concurrent connections. The goal is to remove every barrier to experimentation so teams can experience how conversational speech recognition changes what’s possible in voice AI.