A Direct Path from Traffic Activity to Air Quality: Developing a Machine Learning Model to Predict High Resolution Particulate Matter Concentration

Abstract

While transportation systems serve an important role in people’s daily lives, the externalities from these systems, including traffic related air pollution, threaten public health. Estimating air pollution concentration and contribution from different sources, therefore, has become a major research focus. Our study contributes to the literature by developing machine learning (ML) models to circumvent intermediate modeling steps such as vehicle emission and air quality modeling to estimate traffic related air pollution concentration using traffic, topographic, and meteorological data. Our best model, Convolutional Long Short-Term Memory (ConvLSTM) model, achieves a Mean Relative Error (MRE) of 38.9% which is lower than the 47.5% MRE for the single hidden layer model, 63.2% MRE for the Convolutional Neural Network model, and 41.5% MRE for ConvLSTM with time-series data. The ConvLSTM model benefits from memory cells in predicting a very large number of spatially correlated observations. The overall performance of the ML model in predicting air pollution concentration at receptors exceeds the accuracy of precedent methods. The model also performs well in predicting pollution concentration from air quality monitoring stations. The novel ML modeling approach has low data requirements and is computationally efficient, which makes it promising for future transportation planning, epidemiology, and environmental justice assessments. 

Spatial performance of ML models. Difference between average daily PM2.5 concentration from the target values: a) Single Hidden Layer Neural Network; b) CNN; c) ConvLSTM; d) ConvLSTM with Time-Series Data