An official website of the United States government
blue sky with white clouds

World’s premier ground-based observations facility advancing atmospheric research

New methods for extracting more detail from existing data sets

Submitter

Isaacman-VanWertz, Gabriel — Virginia Polytechnic Institute and State University

Area of Research

Aerosol Properties

Journal Reference

Kim S, L Yee, A Goldstein, and G Isaacman-VanWertz. 2025. "Systematic characterization of unknown compounds via dimensionality reduction of time series." 10.1080/02786826.2024.2445634.

Science

An example of the type of information extracted using this approach. Time series are shown for several analytes cataloged in the data set. Data includes both identified chemicals and unknown compounds, which is feasible in large part because all processing is automated. These unknown peaks can be clustered by properties and provide validation and improved statistics for the observed variations in concentration that impact known compounds. Image from journal.

Example of cataloging all of the analytes found in a chromatogram (black line). Over 1000 different peaks (red lines) are automatically extracted from the data set. Starred compounds are those identified by manual processing in previous work. Of these 1000 analytes, approximately 400 were found to have enough signal and resolution to be processed into full month-long time series of how concentrations vary at the Manacapuru ARM site. Figure from Kim, S, et al. 2022. "Comprehensive detection of analytes in large chromatographic datasets by coupling factor analysis with a decision tree." Atmospheric Measurement Techniques 15(17): 5061-5075.

Detailed data of what is in the atmosphere is often very complex, containing thousands of chemicals without known identities or properties. By developing new automated tools for analyzing certain types of data, this research will substantially improve the ability to make sense of these data and extract new details about the composition of the atmosphere.

Impact

Many data sets collected at atmospheric field sites focus only on a subset of specific data of interest because fully analyzing all of the information has been too time- or labor-intensive. By developing a new tool for automated analysis of certain types of data sets, this advance will provide researchers with substantially more data to answer a broad range of questions about atmospheric process.

Summary

One approach to understanding the detail of what is in the air is a gas chromatograph, which separates the complex mixture of chemicals into the air into all the component parts. Knowing what different chemicals are present provides information about the sources and processes transforming aerosols and air pollutants. Because this data is so complex, researchers often focus on specific target chemicals that provide known information, and a lot of data are never examined due to time and labor limitations. This research uses advanced techniques, including machine learning, to automatically analyze and organize this data, making the process faster and more efficient. The method was tested on samples at the ARM site in Manacapuru, Brazil during the GoAmazon2014/15 field campaign. Using automated methods with limited operator effort, time series of over 400 unique chemicals were generated, characterized by chemical properties, and grouped into categories. These data will not only improve understanding of atmospheric processes at this field site by expanding the amount of detail known, but the method will significantly improve processing of data and the amount of detailed information available at future field sites.

ARM Logo

Follow Us:

Keep up with the Atmospheric Observer

Updates on ARM news, events, and opportunities delivered to your inbox

Subscribe Now

ARM User Profile

ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.

Atmospheric Radiation Measurement (ARM) | Reviewed October 2024