An official website of the United States government
blue sky with white clouds

World’s premier ground-based observations facility advancing atmospheric research

From Observations to Interpretable AI for Explaining and Predicting CBLH Variability

Submitter

Wang, Zhien — Stony Brook University
Chu, Yufei — Stony Brook University

Area of Research

Atmospheric Thermodynamics and Vertical Structures

Journal Reference

Chu Y, G Lin, M Deng, L Xue, W Li, H Shin, J Zhang, H Guo, and Z Wang. 2026. "Thermodynamics-guided machine learning model for predicting convective boundary layer height and its multi-site applicability." Atmospheric Chemistry and Physics, 26(2), 10.5194/acp-26-1415-2026.

Chu Y, Z Wang, M Deng, G Lin, L Xue, W Li, H Shin, and C Brabec. 2026. "The Spatial and Temporal Variability of the Clear Convective Boundary Layer at the ARM SGP Supersite." Advances in Atmospheric Sciences, 10.1007/s00376-025-5575-2.

Science

Fig. 2. Seasonal performance of the C1 ECOR model. The four columns represent the four seasons: spring (MAM), summer (JJA), autumn (SON), and winter (DJF); (a) Comparison of observed and predicted CBLH; (b) Diurnal evolution of mean observed and predicted CBLH; (c) Diurnal evolution of absolute difference and relative difference between observed and predicted mean CBLHs; (d) Beeswarm plot of SHAP values for four-season analysis; (e) SHAP-derived feature importance for four-season analysis.

Fig. 1. (a) The location of the ARM SGP site; (b1)–(b5) Doppler lidar (DL) vertical wind velocities with derived MLHs at the five sites (C1, E32, E37, E39, and E41) on September 1, 2018; (c)–(f) Composite diurnal–seasonal MLHs at the SGP C1, E32, E37, E39, and E41 sites, based on 4-year weekly averages.

This paired study presents a comprehensive investigation of convective boundary layer (CBL) dynamics by integrating four years of high-resolution Doppler lidar observations from five Atmospheric Radiation Measurement (ARM) User Facility Southern Great Plains (SGP) sites with advanced thermodynamics-guided machine learning.

The observational analysis (Fig. 1) first quantified significant sub-grid scale heterogeneity—despite relatively flat terrain, daily maximum mixing layer height (MLH) varied by up to 1-km (∼30% of the mean) within a 100-km domain. The 4-year weekly composite diurnal–seasonal MLHs (Fig. 1c–f) revealed a pronounced east–west contrast that reverses seasonally, driven by land-surface gradients. Rigorous statistical analysis further demonstrated that MLH is positively correlated with surface-sensible heat flux (SHF) and negatively correlated with lower tropospheric stability (LTS). However, these traditional correlations could not fully explain the observed site-to-site differences.

Building upon these findings, the team developed a novel thermodynamics-guided machine learning framework to overcome the limitations of conventional statistics. By incorporating physics-informed energy-balance constraints and the full diurnal cycle as input features, AutoML (TPOT + AutoKeras) was used to identify the optimal model architecture and parameters. The resulting models achieved high-predictive accuracy (R² = 0.84 at the Central Facility; R² = 0.79–0.81 when transferred to nearby sites). SHAP (SHapley Additive exPlanations) interpretability analysis (Fig. 2) then revealed that LTS remains the dominant predictor year-round, with modest seasonal shifts in feature importance (<10%) and notably higher model uncertainty in summer (JJA), reflecting greater surface-atmospheric interference.

Impact

Global earth system models at ∼100-km resolution assume uniformity within grid cells, yet this work demonstrates that CBL depths can vary by 30% within a single cell. The AI-powered, physically constrained, and fully interpretable machine learning model provides both critical observational benchmarks and a transferable predictive tool, advancing turbulence parameterization, cloud formation, and energy exchange simulations in weather and earth system models.

Summary

The convective boundary layer regulates vertical transport of heat, moisture, and aerosols. While the SGP region is often viewed as flat, complex gradients in soil moisture and vegetation drive significant sub-grid atmospheric heterogeneity. Using four years of Doppler lidar data from five ARM SGP sites, the observational study revealed striking spatial MLH variability and seasonal east–west reversals driven by surface fluxes and stability. Recognizing the limits of traditional approaches, the team introduced a novel thermodynamics-guided machine learning framework that successfully predicts the full diurnal CBLH cycle with high accuracy and demonstrates multi-site applicability. SHAP interpretability further elucidates the relative roles of thermodynamic drivers, offering a powerful new pathway to bridge observations and model parameterization at the sub-grid scale.

ARM Logo

Follow Us:

Keep up with the Atmospheric Observer

Updates on ARM news, events, and opportunities delivered to your inbox

Subscribe Now

ARM User Profile

ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.

Atmospheric Radiation Measurement (ARM) | Reviewed March 2025