Classifying Thermodynamic Cloud Phase Using Machine Learning Models
Submitter
Zhang, Damao — Pacific Northwest National Laboratory
Area of Research
Cloud Distributions/Characterizations
Journal Reference
Goldberger L, M Levin, C Harris, A Geiss, M Shupe, and D Zhang. 2025. "Classifying Thermodynamic Cloud Phase Using Machine Learning Models." 10.5194/egusphere-2025-1501.
Science
Thermodynamic cloud phase classifications from the three ML models and their comparisons against the THERMOCLDPHASE VAP are shown for August 15, 2021, at the NSA atmospheric observatory: a-d) time-height plots of thermodynamic cloud phase classifications from the VAP, as well as from CNN, MLP, and RF model predictions; e-g) confidence scores of thermodynamic cloud phase classification predictions from the three ML models; h-k) histograms of thermodynamic cloud phase distributions; and l-n) normalized confusion matrices for each model. Figure 3a is identical to Figure 1g. Image is courtesy of the journal.
The ARM Thermodynamic Cloud Phase (THERMOCLDPHASE) value-added product (VAP) applies a multi-sensor approach to classify thermodynamic cloud phase by integrating lidar backscatter and depolarization, radar reflectivity, Doppler velocity, spectral width, microwave radiometer-derived liquid water path, and radiosonde temperature measurements. Cloud Hydrometeors are classified into seven phase categories including: liquid, drizzle, liquid + drizzle (liq_driz), rain, ice, snow, and mixed-phase. In this study, we evaluated a machine learning (ML) method for thermodynamic cloud phase classification, trained on three years of THERMOCLDPHASE VAP observations.
Impact
Threshold-based algorithms used in the THERMOCLDPHASE VAP are limited because they are static and require fine-tuning for different regions. Also, conventional algorithms cannot adapt when key datastreams are missing. ML offers a more flexible approach to thermodynamic cloud phase classification, with performance that improves as more training data become available, and the model adapts to low-quality or missing inputs.
Summary
In this work, we developed three ML models with increasing complexity: a random forest (RF), a multilayer perceptron (MLP) neural network, and a convolutional neural network (CNN) with a U-Net architecture for classifying thermodynamic cloud phase. We used three years of ARM THERMOCLDPHASE VAP data from the North Slope of Alaska (NSA) atmospheric observatory as ground truth for training. In addition to evaluating the ML models’ performance using NSA data, we evaluated the models’ generalizability to another arctic ARM site, using data from the Cold-Air Outbreaks in the Marine Boundary Layer Experiment (COMBLE) in northern Norway, and tested each model’s robustness against simulated instrument data loss.
Evaluations against the outputs of the THERMOCLDPHASE VAP with one year of data showed that the CNN outperforms the other two models, achieving the highest test accuracy. The CNN U-Net model trained with input channel dropouts performed better when input fields are missing.
Keep up with the Atmospheric Observer
Updates on ARM news, events, and opportunities delivered to your inbox
ARM User Profile
ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.