Streamlining ARM Data Access With AI-Ready Infrastructure

Published: 28 May 2026

New storage, software, and computing frameworks set the stage for next‑generation data tools and research support

Two people examine a high performance computing cluster. The people are blurry, indicating that they are moving around at a fast pace. — The ARM Data Center has been preparing its infrastructure, including the Cumulus high performance computing cluster, for artificial intelligence (AI). Image is courtesy of Oak Ridge National Laboratory (ORNL).

As artificial intelligence (AI) becomes more integrated into scientific workflows, the U.S. Department of Energy’s (DOE’s) Atmospheric Radiation Measurement (ARM) User Facility continues to improve its computing, storage, and software frameworks and develop new tools that enhance the ARM data user experience.

The improvements aim to help researchers access observations and metadata more quickly and easily while reducing the time spent searching for, understanding, downloading, and managing large data sets. This is particularly important because ARM has collected more than 30 years of atmospheric data, totaling over 8 petabytes.

“AI‑ready infrastructure is no longer optional,” says Giri Prakash, ARM’s chief data and computing officer.

Prakash, who manages the ARM Data Center at Oak Ridge National Laboratory (ORNL) in Tennessee, describes the effort as a phased, incremental approach to enhancing data infrastructure to support ARM’s growing computational demands. “These developments are being added to ARM’s already very capable infrastructure to accommodate the demanding requirements of AI applications.”

Building an AI‑Ready Data Center

ARM began preparing its infrastructure for AI about four years ago, starting with hardware.

The ARM Data Center installed graphics processing units (GPUs) to the Cumulus high performance computing cluster. Multiple projects used the GPUs, including data quality analysis, radar processing, and data product generation.

As AI use intensifies, a more significant upgrade is now underway. ARM is replacing its file server with an AI-ready storage platform that connects directly to the GPU environment. This will enable AI models to access ARM data and metadata at high speed, rather than waiting for slower file transfers.

For more information: Check out ARM’s new Artificial Intelligence web page.

According to Prakash, ARM is acquiring 25 to 30 new GPUs, including processing units designed to accelerate AI workloads, to meet its computational needs over the next two to five years.

Along with adding and upgrading computing and storage infrastructure, ARM’s cybersecurity and network engineering teams are enhancing controls to manage access to ARM computing, data, and AI resources and tools.

The enhanced infrastructure extends beyond hardware. ARM has been developing a software environment that will enable large language models (LLMs) and agent-based systems to communicate with data holdings, metadata, and quality records.

As organizations like ARM build AI-ready infrastructure, they are focusing on AI agents that can reason through multi-step tasks, access external tools, and make limited autonomous decisions. Unlike traditional AI assistants that simply answer questions, agent-based systems can retrieve data, interact with software platforms, and coordinate workflows with minimal human intervention.

Prakash describes an LLM as “the brain that understands and explains; an agent is the system that uses that brain to access data and get work done. While the LLM provides general reasoning and language ability, agents connect it to institutional knowledge, tools, and actions.”

ARM Data Advisor: Putting the User First

Three conversation bubbles, with the first introducing themselves as the ARM Data Center and asking how they can help today, the second asking for ARM data products relevant to recent research on cloud microphysics and atmospheric water properties, and the third responding with a list of liquid water path and precipitable water vapor variables, datastreams, sites where they are measured, the period of record that overlaps the last ~3 years, quality control information, and links to the data in Data Discovery. — In this ARM Data Advisor example, the AI agent responds to a user request for recent research on cloud microphysics and atmospheric water properties. Screen capture is courtesy of Wade Darnell, ORNL.

One of the most noticeable changes for researchers in the near term will be the introduction of the ARM Data Advisor (ADA, which is pronounced “ā-duh”), an AI agent that will streamline ARM data discovery and access.

ADA is currently being tested by a small group of ARM staff and users.

According to Wade Darnell, an ORNL software developer and ADA’s lead developer, this new assistant will answer questions, suggest data sets, display data plots, explain data quality, and even place data orders—all through a natural-language conversational interface.

Basic data ordering will be available in the initial rollout, but more advanced ordering and data extractions will be added in future versions.

Instead of navigating multiple panels or search fields to drill into data sets, users can tell ADA what they need. For example, users can request data sets with latent heat flux or observations from a specific campaign or instrument. ADA will identify relevant data sets and provide users with context explaining why the data are relevant. With one click, users will be able to order data from within ADA’s interface.

This user-centric approach extends beyond search capabilities. Upon its release, ADA will also provide personalized recommendations for returning users, suggest new datastreams, and deliver files in multiple formats suitable for analysis, workflow automation, or downstream modeling.

ADA’s conversational interface allows users to easily pinpoint customized search results and deliver the data in their preferred format. Meanwhile, human support will always be available, providing manual oversight and direct assistance.

ADA is expected to be introduced in July 2026, and it will evolve over time. The traditional search tool will remain in place until developers are confident that ADA is meeting the needs of ARM users.

Those interested in participating in ADA testing can contact the ARM Data Center.

The Next Step: ATLAS

To enable AI-ready infrastructure, ARM developers are building a framework called the Agentic Tooling and LLM Augmentation Stack (ATLAS).

ATLAS provides a shared platform enabling AI-driven tools to work together—from data discovery to workflow automation—while upholding ARM’s standards for transparency, security, and scientific integrity.

The purpose of ATLAS is straightforward: accelerate AI and machine learning adoption by offering shared, standardized infrastructure. This includes:

providing model inference—the process that LLMs use to answer queries—through OpenAI-compatible endpoints (ATLAS exposes model access through standardized interfaces that follow OpenAI’s format and conventions)
converting information, such as data sets, metadata, documentation, and prior knowledge, into a format that captures its semantic meaning, allowing systems to consistently retrieve and use relevant details for more accurate, informed responses across workflows
coordinating workflows guided by domain‑specific agents
delivering secure, governed access to ARM data and services.

A flowchart shows connections between users, clients, a router, agents, tools, resources, an inference gateway, ARM graphics processing units, a vector store and backend database, and the Genesis Mission and American Science Cloud. — This simplified diagram of the Agentic Tooling and LLM Augmentation Stack (ATLAS) framework illustrates how this architecture can scale and extend with multiple users, clients, agents, tools, and resources.

ATLAS provides a common foundation for developing and integrating AI-powered tools used for ARM functions, including metadata generation, data quality analysis, and enhanced website search capabilities with a forthcoming digital assistant called Ask ARM. ATLAS also supports connections to multiple model-serving environments that require GPUs, both internally and on external platforms.

One example of an external platform is the American Science Cloud. This DOE initiative, which is part of the Genesis Mission, unifies national lab supercomputing resources into a secure cloud for AI-driven scientific discovery.

Announced in late 2025, Genesis aims to build an integrated platform that connects AI models, curated scientific data, workflows, and computing resources across DOE laboratories to accelerate discovery, enable autonomous science, and scale impact across the broader research ecosystem.

A Rewired Research Environment

ARM’s AI-ready infrastructure extends to the representation of data, context, and scientific knowledge. Years of consistent ARM metadata and data quality records provided by ARM’s Data Quality Office, instrument mentors, and translators now form the foundation for retrieval systems.

Over the next three years, Prakash says ARM is expected to add 2 to 3 petabytes of storage for vectorized content—searchable metadata embeddings, guidance pages, instrument handbooks, and other supporting materials. To support advanced scientific AI workflows, ARM also plans to transform and chunk scientific variables and observational data into AI-readable contextual representations that can be efficiently indexed, retrieved, and analyzed by agentic AI systems.

By leveraging GPU Direct Storage and tightly integrated high performance storage architectures, ARM reduces data movement bottlenecks and accelerates AI model response times. Facility operations teams are now working together on energy, cooling, and networking as GPU clusters expand.

Through these efforts, ARM is reconfiguring how computing, storage, and software systems interact so AI models can answer questions with grounded, authoritative information from the documentation developed over the years by ARM staff.

What Users Should Expect

“ARM is not simply bolting AI onto existing systems. We are rearchitecting our data ecosystem around AI, and the result will be a dramatically improved and more intuitive user experience.”

Giri Prakash, ARM chief data and computing officer

In the near term, researchers accessing the ARM Data Center will experience faster performance, more intuitive search results, and early conversational interfaces to help with standard tasks.

A broader range of agent-driven tools will become available as ATLAS matures, helping users automate workflows and navigate ARM data more effectively. Prakash emphasizes that user feedback will continue to guide their rollout.

In turn, the ARM development team will incrementally update capabilities to improve their usability and accuracy, as well as their integration with other scientific systems.

“ARM is not simply bolting AI onto existing systems,” says Prakash. “We are rearchitecting our data ecosystem around AI, and the result will be a dramatically improved and more intuitive user experience.”

Building Guardrails, Accelerating Discovery

To learn more about ARM’s AI infrastructure and other AI-focused initiatives, click on the video to watch a recording of the AI in ARM webinar, held April 28, 2026.

Although today’s focus on AI seems new, AI capabilities and algorithms have been developed by groups across ARM and integrated at the ARM Data Center for several years. According to Prakash, the difference today is the rapid development of AI technology and its availability to the research community.

As ARM moves forward in implementing AI infrastructure, the ARM Data Center team continues to develop software and plan the required AI infrastructure.

Meanwhile, the AI in ARM Team is finalizing a governance document for the user facility that sets guardrails for the responsible use of AI, defines ethical standards aligned with DOE principles, and establishes evaluation criteria and best practices for working with AI.

By aligning ARM’s AI-optimized data flow with DOE’s Genesis Mission, the ARM Data Center team is helping to ensure that ARM’s data sets fuel a unified platform designed to shorten the time between observation and discovery. For the scientific community, this means more than just faster processing. It creates a trustworthy framework in which AI-driven insights remain grounded in high-fidelity observations.

Chirag Shah, Wade Darnell, and Giri Prakash of ORNL made technical contributions to this story.

# # #

ARM is a DOE Office of Science user facility operated by nine DOE national laboratories.