Cumulus says that the synergies between its core visibility functions and the newly added lifecycle management capabilities will make life much easier for customers, especially ones with more limited IT staffs.
Networking software provider Cumulus Networks has moved beyond providing visibility for telemetry and troubleshooting with their NetQ 3.0 release. The main change is the addition of automation around lifecycle management, particularly to handle early set-up tasks that had become problematic for some customers with less sophisticated IT staffs.
“Prior to 3.0, NetQ was a real-time network visibility and telemetry troubleshooting tool,” said Partho Mishra, President and Chief Product Officer at Cumulus Networks. “That’s all it did. Now it can do things like the Day One and Day Two set-up operations, including deploying images onto network switches, and deploying configurations onto a bare metal switch. It will also handle other lifecycle activities, so that if a switch fails, you bring in another switch and clone the image of the first switch on it.”
The key here, and why Cumulus believes its lifecycle management is differentiated from something that every major vendor in their space already has, is the combination of the new lifecycle management functionality with their traditional operations capabilities, such as visibility, troubleshooting, validation, trace and comparative look-back functionality.
“In terms of lifecycle management, every major network vendor has their own version,” Mishra said. “The depth of our real-time telemetry provides us with superior real time visibility to what’s going on in the network. So we are basically piggybacking on what we did before.
“When we do lifecycle management, its not done the old dumb way,” he stressed. “It’s very sophisticated. The software figures out, for example, if there is a way to upgrade all switches in parallel, without impacting service. It determines what part of the network will be impacted, to let you know how many can be impacted at once. And then it creates a schedule, so now instead of having to upgrade one switch at a time, it’s a push button upgrade with a lot of sophistication under the hood. Under the hood of the UI we continue to innovate to deal with these complexities, and run through all of these checks to make sure upgrades will not cause any problems.”
This push-button automation capability had become essential, Mishra said, because as the Cumulus customer base expanded beyond its initial base with large sophisticated IT staffs, so did the number whose IT staffs became bogged down in routine tasks.
“In our first years, our customers had engineers who took care of everything, and then we got a second group of customers who didn’t have that programming savvy,” Mishra noted. “So our consulting teams would come in and write automation for the customer – here’s the code, here’s the configuration templates.
“Now, as our business continues to expand, we found that we are beginning to run into IT teams who really aren’t proficient. So 3.0 is about building a UI-driven way of doing things like the initial network configuration and installation. It allows all these things to be done in a repeatable fashion.”
The most important of these new capabilities is configuration management. It lets customers use a golden configuration from a centralized location to help automate repetitive configuration tasks and easily identify differences between the default configuration and the configuration of specific switches.
“While all switches have the same configuration initially, over time some will have specific configurations for specific reasons,” Mishra said. “This lets you automate a backup or a restore. Previously, they would have had to have done it manually or edit files in scripting.”
The software upgrade management capability – the one that determines how many switches can be upgraded at once to deal with things like security patches and bug fixes – is another important net-new.
Version 3.0 also leverages capabilities like Snapshot, which was first introduced two releases ago. Snapshot permits a comparison of the live state of the network before and after maintenance or configuration change.
“When you make a change, you can verify if the change had the desired effect,” Mishra said. “Snapshot is a building block, which we use in an automated fashion in 3.0 to run automated checks to compare the speed of the network before and after. It’s another example of piggybacking, in this case on the Snapshot capability.”
A ‘What Just Happened’ [WJH] capability is now fully available on Mellanox switches, which provides detailed information about any packet drop on any device on the network for any reason.
“In its initial form, we first supported this in December,” Mishra indicated. “Mellanox over time has added more capabilities including tracing performance to provide visibility into end to end flow,”
NetQ 3.0 also continues to expand the ability to view and validate network state, intent and configuration across the entire network from switches to hosts to containers, and enhances the ability to verify connectivity between two devices either on-demand or per a desired schedule, to trace where a problem is occurring.
NetQ 3.0 is available now.