VAST and NVIDIA Redesign AI Inference for Agentic Era

VAST Data, the AI Operating System company, today

announced a new inference architecture that enables the NVIDIA Inference Context Memory

Storage Platform – deployments for the era of long-lived, agentic AI. The platform is a new class

of AI-native storage infrastructure for gigascale inference. Built on NVIDIA BlueField-4 DPUs

and Spectrum-X Ethernet networking, it accelerates AI-native key-value (KV) cache access,

enables high-speed inference context sharing across nodes, and delivers a major leap in power

efficiency.

As inference evolves from single prompts into persistent, multi-turn reasoning across agents,

the notion that context stays local breaks down. Performance is increasingly governed by how

efficiently inference history (KV cache) can be stored, restored, reused, extended, and shared

under sustained load – not simply by how fast GPUs can compute.

VAST is rebuilding the inference data path by running VAST AI Operating System (AI OS)

software natively on NVIDIA BlueField-4 DPUs, embedding critical data services directly into the

GPU server where inference executes, as well as in a dedicated data node architecture. This

design removes classic client-server contention and eliminates unnecessary copies and hops

that inflate time-to-first-token (TTFT) as concurrency rises. Combined with VAST’s parallel

Disaggregated Shared-Everything (DASE) architecture, each host can access a shared, globally

coherent context namespace without the coordination tax that causes bottlenecks at scale,

enabling a streamlined path from GPU memory to persistent NVMe storage over RDMA fabrics.

“Inference is becoming a memory system, not a compute job. The winners won’t be the clusters

with the most raw compute – they’ll be the ones that can move, share, and govern context at

line rate,” said John Mao, Vice President, Global Technology Alliances at VAST Data

“Continuity is the new performance frontier. If context isn’t available on demand, GPUs idle and

economics collapse. With the VAST AI Operating System on NVIDIA BlueField-4, we’re turning

context into shared infrastructure – fast by default, policy-driven when needed, and built to stay

predictable as agentic AI scales.”

Beyond raw performance, VAST gives AI-native organizations and enterprises deploying

NVIDIA AI factories a path to production-grade inference coordination with high levels of

efficiency and security. As inference moves from experimentation into regulated and revenue-

driving services, teams need the ability to manage context with policy, isolation, auditability,

lifecycle controls, and optional protection – all while keeping KV cache fast and usable as a

shared system resource. VAST delivers those AI-native data services as part of the AI OS,

helping customers avoid rebuild storms, reduce idle-GPU resource waste, and improve

infrastructure efficiency as context sizes and session concurrency explode.

“Context is the fuel of thinking. Just like humans that write things down to remember them, AI

agents need to save their work so they can reuse what they’ve learned,” said Kevin Deierling,

Senior Vice President of Networking, NVIDIA. “Multi-turn and multi-user inferencing

fundamentally transforms how context memory is managed at scale. VAST Data AI OS with

NVIDIA BlueField-4 enables the NVIDIA Inference Context Memory Storage Platform and a

coherent data plane designed for sustained throughput and predictable performance as agentic

workloads scale.”

Experience VAST’s industry-leading approach to AI and data infrastructure at VAST Forward,

our inaugural user conference, February 24–26, 2026 in Salt Lake City, Utah. Engage with

VAST leadership, customers, and partners through deep technical sessions, hands-on labs, and

certification programs. Register here to join.

Trending Now

VAST and NVIDIA Redesign AI Inference for Agentic Era

Share

Featured News

UiPath & Deloitte Accelerate Software Testing with New Agentic AI Solution

IFS Q1 2026 Financial Results Deliver Strong 25% ARR Growth

Sophos Appoints Hussain Salman to Lead Enterprise Services in the Gulf

Newsletter Subscription

Stay Connected

VAST and NVIDIA Redesign AI Inference for Agentic Era

Share

Featured News

UiPath & Deloitte Accelerate Software Testing with New Agentic AI Solution

IFS Q1 2026 Financial Results Deliver Strong 25% ARR Growth

Sophos Appoints Hussain Salman to Lead Enterprise Services in the Gulf

Newsletter Subscription

Stay Connected

Subscribe Newsletter

Join our mailing list