New methods for extracting more detail from existing data sets
Submitter
              Isaacman-VanWertz, Gabriel             — Virginia Polytechnic Institute and State University
                    
            
Area of Research
Aerosol Properties
Journal Reference
Kim S, L Yee, A Goldstein, and G Isaacman-VanWertz. 2025. "Systematic characterization of unknown compounds via dimensionality reduction of time series." Aerosol Science and Technology, 10.1080/02786826.2024.2445634.
Science
 
      An example of the type of information extracted using this approach. Time series are shown for several analytes cataloged in the data set. Data includes both identified chemicals and unknown compounds, which is feasible in large part because all processing is automated. These unknown peaks can be clustered by properties and provide validation and improved statistics for the observed variations in concentration that impact known compounds. Image from journal.
 
      Example of cataloging all of the analytes found in a chromatogram (black line). Over 1,000 different peaks (red lines) are automatically extracted from the data set. Starred compounds are those identified by manual processing in previous work. Of these 1,000 analytes, approximately 400 were found to have enough signal and resolution to be processed into full month-long time series of how concentrations vary at the Manacapuru ARM site. Figure from Kim, S, et al. 2022. "Comprehensive detection of analytes in large chromatographic datasets by coupling factor analysis with a decision tree." Atmospheric Measurement Techniques 15(17): 5061-5075.
Detailed data of what is in the atmosphere is often very complex, containing thousands of chemicals without known identities or properties. By developing new automated tools for analyzing certain types of data, this research will substantially improve the ability to make sense of these data and extract new details about the composition of the atmosphere.
Impact
Many data sets collected at atmospheric field sites focus only on a subset of specific data of interest because fully analyzing all of the information has been too time- or labor-intensive. By developing a new tool for automated analysis of certain types of data sets, this advance will provide researchers with substantially more data to answer a broad range of questions about atmospheric process.
Summary
One approach to understanding the detail of what is in the air is a gas chromatograph, which separates the complex mixture of chemicals in the air into all the component parts. Knowing what different chemicals are present provides information about the sources and processes transforming aerosols and air pollutants. Because this data is so complex, researchers often focus on specific target chemicals that provide known information, and a lot of data are never examined due to time and labor limitations. This research uses advanced techniques, including machine learning, to automatically analyze and organize this data, making the process faster and more efficient. The method was tested on samples at the ARM site in Manacapuru, Brazil during the GoAmazon2014/15 field campaign. Using automated methods with limited operator effort, time series of over 400 unique chemicals were generated, characterized by chemical properties, and grouped into categories. These data will not only improve understanding of atmospheric processes at this field site by expanding the amount of detail known, but the method will significantly improve processing of data and the amount of detailed information available at future field sites.
Keep up with the Atmospheric Observer
Updates on ARM news, events, and opportunities delivered to your inbox
ARM User Profile
ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.
 An official website of the United States government
            An official website of the United States government
          
       
               
               
         
       
             
             
             
             
             
             
     
     
     
     
     
     
     
    