Machine Learning Reveals Key Drivers of Atmospheric New Particle Formation
Submitter
Wang, Yang — University of Miami
Mei, Fan — Pacific Northwest National Laboratory
Area of Research
Aerosol Properties
Journal Reference
Hao W, M Mehra, G Budhwani, T Chakraborty, F Mei, and Y Wang. 2026. "Employing Machine Learning for New Particle Formation Identification and Mechanistic Analysis: Insights from a Six‐Year Observational Study in the Southern Great Plains." Journal of Geophysical Research: Atmospheres, 131(1), e2024JD043116, 10.1029/2024JD043116.
Science
Aerosol particle size distribution over time during a new particle formation event, with data‑driven machine learning identifying the six most influential features. Image provided by authors from the publication Employing machine learning for new particle formation identification and mechanistic analysis: Insights from a six-year observational study in the Southern Great Plains.
New particle formation (NPF) is a major source of atmospheric nanoparticles that affect aerosol populations, air quality, human health, and the atmosphere. The complex and nonlinear interactions among radiation, gases, and meteorology make it difficult to pinpoint what conditions trigger events that form new particles. In this study, researchers applied a machine learning technique (random forest) to long-term atmospheric measurements in a rural continental environment to classify NPF and non‑NPF days and to identify which environmental factors matter most. The approach captures the intricate relationships that traditional methods often miss and provides a quantitative ranking of the controlling variables.
Impact
Understanding when and why NPF occurs is critical for improving air quality and earth system models. The machine learning framework developed here provides a new, systematic way to disentangle the roles of solar radiation, pollutants, and meteorology in particle formation. The results show that strong solar radiation markedly increases the probability of NPF, consistent with its role in driving photochemical production of low‑volatility vapors such as sulfuric acid and extremely low‑volatility organic compounds. This highlights the importance of accurate representation of photochemistry and related environmental drivers in models used for air‑quality management and prediction, and it demonstrates how data-driven tools can advance atmospheric process understanding.
Summary
Researchers used a random forest classification model to analyze NPF events in a rural continental site by using long‑term atmospheric observations. The random forest model distinguished NPF days from non‑NPF days and produced a feature‑importance ranking for key atmospheric variables. Partial dependence plots were then used to visualize how individual variables influence the probability of NPF while other factors remain constant.
Solar radiation intensity emerged as one of the most important predictors. The partial dependence plots show a strong positive relationship between solar radiation intensity and NPF likelihood, with a steep increase in probability once solar radiation intensity exceeds about 200 W m⁻². This behavior is consistent with the physical role of sunlight in driving photochemical reactions that generate low‑volatility vapors (such as sulfuric acid and extremely low‑volatility organic compounds) that are essential for NPF. The analysis focuses on daytime periods, when about 95 percent of observed NPF events occur. Together, the random forest and partial dependence plot results show that machine learning can align closely with established physical understanding while offering new quantitative insight into the environmental controls for particle formation.
Keep up with the Atmospheric Observer
Updates on ARM news, events, and opportunities delivered to your inbox
ARM User Profile
ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.