Prediction of air pollution in Changchun based on OSR method

Size: px
Start display at page:

Download "Prediction of air pollution in Changchun based on OSR method"

Transcription

1 ISSN , England, UK World Journal of Modelling and Simulation Vol. 13 (2017) No. 1, pp Prediction of air pollution in Changchun based on OSR method Shuai Fu 1, Yong Jiang 2, Shiqi Xu 3, Kai Zhao 1,2, Yi Jiang 1,2 1 Institute of Space Weather, Nanjing University of Information Science and Technology, Nanjing, 244, China 2 School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, 244, China 3 Jilin Climate Center, Changchun, 162, China (Received May , Accepted October ) Abstract. Applying the Optimal Subset Regression (OSR) method, the forecasting equations of air quality index (AQI) and pollutant (PM 2.5, PM 10, O 3, NO 2 ) concentrations are preliminary established for Changchun, China. Besides the simultaneous meteorological elements, adding the previous day s pollutant concentrations could make the regression equations more stable and accurate. However, deviation still exists between the forecasts and observations, especially in the extreme cases. Keywords: optimal subset regression, prediction, error analysis 1 Introduction With the high development of social economy and acute aggravation of city population, cities have been rapidly expanding, and the consumption of energy as well as the emission of pollutants are gradually growing. Serious air pollution are threatening our health [11]. How to prevent and control air pollution has become the focus of the general public. Air pollution prediction is a hot and difficult point in the field of environmental science [6]. The primary issue of preventing or controlling air pollution is the scientific understanding of it. Research work has confirmed meteorological elements (such as surface pressure, precipitation, wind speed and direction, temperature and so on) and atmospheric circulations could usually affect air quality [2 4, 9, 10, 12, 13, 15]. For example, the temperature inversion layer inhibits the diffusion of pollutants; rainfall has a role in the erosion of pollutants; wind speed affects the diffusion rate of pollutants and wind direction controls the influenced range. Changchun is a famous national forest city of China, whose forest coverage rate is as high as 30.66% [7]. However, its air quality is getting worse in recent years, and it has boarded the worst air quality ranking for a few times [1, 5]. It is now urgent to control air pollution and provide effective air quality forecasting. To achieve this goal, we adopt OSR method and build forecasting equations of AQI and pollutant (PM 2.5, PM 10, O 3, NO 2 ) concentrations. National Natural Science Foundation of China (Grant No , ), Special Project for Meteo-scientific Research in the Public Interest (Grant No. GYHY ), Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 14KJB170012) Corresponding author. address: jiang@nuist.edu.cn Published by World Academic Press, World Academic Union

2 World Journal of Modelling and Simulation, Vol. 13 (2017) No. 1, pp Data and OSR 2.1 Data (1) Daily observations from 10 automatic monitoring stations of environmental air quality, including AQI and pollutant concentrations (PM 2.5, PM 10, O 3, NO 2 ). The daily average is regarded as the representation of Changchun. AQI has 6 levels, from grade 1 to grade 6, respectively corresponding to excellent (0 AQI ), good ( P ), slightly polluted ( AQI < ), moderately polluted ( AQI < ), heavily polluted ( AQI < ) and severely polluted (AQI ). The greater AQI, the worse air quality. (2) Surface meteorological elements (pressure, precipitation, relative humidity, wind speed, meanmaximum-minimum temperature, temperature difference) obtained from Jilin Climate Center. Temperature difference is defined as the maximum minus minimum temperature. A way of coding daily precipitation is adopted, that is, no rain marked as 0, light rain marked as 1, moderate rain marked as 2, heavy rain marked as 3, and rainstorm and above marked as 4. Studies have indicated that such a processing is beneficial to the actual operation in conventional forecast [14]. 2.2 Description of OSRs Air quality tends to be affected by many factors as well as the interaction between each factor, such as meteorological elements referred above. In order to weaken such a interaction effect and then establish forecast equation reasonably, this paper applies the Optimal Subset Regression (OSR) method to filter and combine factors. The principle of OSR is: assuming m independent variables, and the number of the arbitrary combination (out of the empty set) of the m independent variables should be: m c k m = 2 m 1. (1) k=1 The target of OSR is to decide the best regression equation from all the possible subsets. First of all, we utilize Furnial-Wilson algorithm to get all the possible subsets, and then determine the best one with the Couple Score Criterion (CSC). After getting the optimal equation, the fitting and prediction value is also calculated. Besides, to assess the accuracy of the equation, error analysis is quite necessary. Both the prediction tendency and quantity are taken into consideration by CSC. It is composed of two parts (trend score and quantity score). Assuming that a subset contains k predictors, and CSCk is calculated as follows. CSC k = S 1 + S 2, (2) S 1 = nr 2 = n(1 Q k ), Q y (3) S 2 = 2I = 2[ n ij + n ln n ( n i ln n i + n j ln n j )], (4) i=1 j=1 where, S 1 and S 2 respectively represents quantity score (fine score) and trend score (raw score); n is the sample length; I is the forecast trend type. Q k = 1 n Q y = 1 n n,j = i=1 j=1 n (y t ŷ t ) 2, (5) t=1 n (y t ȳ t ) 2, (6) t=1 n ij, (7) i=1 WJMS for subscription: info@wjms.org.uk

3 14 S. Fu & et al.: Prediction of air pollution in Changchun based on OSR method n i, = n ij, (8) j=1 among them, Q k is the residual sum of squares; Q y is the climatological forecast; n ij is the numbers of contingency tables. CSC is designed to realize better fitting and accurate trend forecast. Obviously, when the CSC k reaches its maximum, the corresponding subset is exactly the optimal one [8]. 3 Characteristics of air pollution in changchun 3.1 Air quality situation Of all the observed data, 91 excellent air quality days (12.5%), 403 good air quality days (55.2%), 141 slightly polluted days (19.3%), 53 moderately polluted days (7.2%), 32 heavily polluted days (4.4%) and 10 severely polluted days (1.4%) are included. In general, its air quality is good, and the probability of heavy and severe pollution is rather low (only 5.8%). A significant feature of AQI and pollutant concentration variations is the semi-annual variations (see Fig. 1), which is usually higher in the winter (from October to March of the following year) half year than that in the summer half year (from April to September) (O 3 is inverted). Seen from Tab. 1, all the heavily polluted days take place in winter half year; and 10 severely polluted days mainly concentrate in October, November and December (only 1 day in May); the air quality below grade 2 (i.e. grade 1 and 2 ) is 176 and 318 days for the half year of winter and summer, respectively. Obviously, the air quality of Changchun is better in the summer half year. As a result, we separately establish forecast models for the winter and summer half year. Table 1: Statistics of monthly air quality (unit: day) Air quality Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Excellent Good Slightly polluted Moderately polluted Heavily polluted Severely polluted AQI (a) PM 2.5 (μg/m 3 ) (b) PM 10 (μg/m3) NO 2 (μg/m3) O 3 (μg/m3) (c) (d) (e) MAR MAY JUL SEP NOV JAN MAR MAY JUL SEP NOV Fig. 1: Daily variation of air pollution in Changchun WJMS for contribution: submit@wjms.org.uk

4 World Journal of Modelling and Simulation, Vol. 13 (2017) No. 1, pp Primary pollutant analysis Daily primary pollutant is also investigated. Results show that, of all the primary pollutants days, PM 2.5, PM 10, O 3 and NO 2 accounts for 44.5%, 31.3%, 17.0% and 5.2%, respectively. PM 2.5 is the dominant pollutant of Changchun. Besides, it has a significant positive correlation between the concentration of PM 2.5, PM 10, NO 2 and AQI (through the significant test at level) (see Tab. 2). And the correlation between the concentration of PM 2.5 and AQI is the maximum (0.952). From the primary pollutant point, the influence of PM 2.5 should be fully considered during atmospheric control and prevention. Table 2: The correlation between the pollutants Pollutants AQI PM 2.5 PM 10 O 3 NO 2 AQI PM PM O Note: * passes the significant test at 0.01 level, ** passes the significant test at level. 4 Prediction models for air pollution Various pollutants into the atmosphere are bound to be affected by the atmospheric turbulence, turbulent diffusion and atmospheric turbulence. And atmospheric pollutants could be transported, mixed and diluted. As a result, meteorological condition has an important role in determining the air quality. Here, 8 conventional ground meteorological elements (i.e., pressure, precipitation, relative humidity, wind speed, mean-maximumminimum temperature, and temperature difference) are chosen to examine their correlations with AQI and air pollutants and further establish prediction model. Table 3: The correlation coefficients between AQI, pollutant concentrations and the surface meteorological elements Period Pressure Precipitation Relative Wind Mean Maximum Minimum Temperature humidity speed temperature temperature temperature difference (X 1) (X 2) (X 3) (X 4) (X 5) (X 6) (X 7) (X 8) Winter AQI PM PM O NO Summer AQI PM PM O NO As shown in Tab. 3, in the winter half year, AQI, air pollutants have positive correlations with relative humidity and temperature (including mean, maximum, minimum and difference temperature), but negative correlations with pressure, precipitation, wind speed. Considering the physical process, when the air temperature and humidity are low, surface pressure field is mainly controlled by the huge clod high pressure and divergent airflow benefits the spread of pollutants and degradation. Besides, flushing action of precipitation could further dilute pollutants. On the other hand, with the higher temperature and humidity in the winter half year, surface pressure field is mainly controlled by the warm low pressure and convergent airflow inhibits the spread of pollutants. For the summer half year, AQI and air pollutants have stable negative correlations with precipitation, relative humidity, wind speed, but positive correlations with pressure and temperature in most conditions. WJMS for subscription: info@wjms.org.uk

5 16 S. Fu & et al.: Prediction of air pollution in Changchun based on OSR method Analysis above shows that atmospheric condition indeed has significant influences on air quality. Using OSR method and setting the 8 meteorological elements as the forecast factors, the winter (from October to March of the following year) and summer (from April to September) half year forecasting equations for AQI, PM 2.5, PM 10, O 3, NO 2 concentration are preliminary built, respectively. Of all the data, observations from 1 October 2014 to 31 March 2015 (1 April to 31 September 2014) are selected as the fitting samples for the winter (summer) half year, and others will be used in the prediction. To assess the accuracy of the established equations, error between the forecasts and observations is also calculated investigated. 4.1 Meteorological factors only In this part, we just put meteorological elements as the predictor. After filtering and combining the 8 elements by OSR, we establish the fitting equations. As shown in Tab. 4, all the multiple correlation coefficients distribute between 0. and By comparison, the multiple correlation coefficients in winter half year tens to be greater than that in summer half year. Table 6 assesses the fitting and prediction ability, and Type 1 refers to the equations considering meteorological factors only. It is not hard to find that both the root mean square errors and mean absolute errors are larger in the winter half year than that in the summer half year, which means that the summer equations have slightly better and more stable prediction effect and might be more conducive to the use of actual business. Table 4: The equations between AQI, pollutant concentrations and the surface meteorological elements established by OSR Period AQI and Regression equations Multiple correlation pollutants (Y) coefficients Winter AQI Y = X X X X PM 2.5 Y = X X X X PM 10 Y = X X X X O 3 Y = X X X Y = X X X X 4 NO X X X Summer AQI Y = X X X X PM 2.5 Y = X X X X X PM 10 Y = X X X X O 3 Y = X X X X X NO 2 Y = X X X Adding previous pollutant concentration Since the development of anything is connected with its past, the past actions not only affect the present, but even the future. It also applies to the changes of air quality. As a result, the above scheme only considering meteorological elements is apparently not enough. In this paragraph, we add previous day s pollutant concentration as a predictor and establish new equations. The correlation coefficients of AQI, PM 2.5, PM 10, O 3, NO 2, and their own previous day s value are 0.67, 0.69, 0.62, 0.74, 0.63, respectively. A very good self-correlation is shown. Table 5 presents the new equations adding concentration term (X 9 ). Obviously, all multiple correlation coefficients have been improved significantly comparing to Tab. 5. For example, the multiple correlation coefficient of AQI in the winter (summer) half year increases from 0.66 (0.65) to 0.75 (0.74). It is surprisingly found that X 9 is only the factor introduced by all equations. In Table 6, Type 2 represents the errors of new equations. All the root mean square errors and mean absolute errors are obviously reduced. Therefore, adding the previous day s pollutant condition makes the forecast more stable, and it is also quite reasonable from the sense of physical process. Here, we present the prediction of AQI and PM 2.5 for example. In Figs. 2 and 3, the winter period refers to January to March in 2014 and October to December in 2015, and the summer period covers April to September in Type 1 and Type 2 have the same meaning as before. For AQI, in the winter half year, WJMS for contribution: submit@wjms.org.uk

6 World Journal of Modelling and Simulation, Vol. 13 (2017) No. 1, pp Table 5: Same as Table 4, but adds pollutant concentration of previous day Period AQI and Regression equations Multiple pollutants correlation (Y) coefficients Winter AQI Y = X X X X X X X 9 PM 2.5 Y = X X X X X X PM 10 Y = X X X X X X O 3 Y = X X X X X X NO 2 Y = X X X X X X X 9 Summer AQI Y = X X X X X X X 9 PM 2.5 Y = X X X X X X PM 10 Y = X X X X X X O 3 Y = X X X X X X NO 2 Y = X X X X X X Table 6: : Error analysis of fitting and predicted samples Period AQI and pollutants Fitting samples Predicted samples Root mean Mean absolute Root mean Mean absolute square error error square error error Type 1 Type 2 Type 1 Type 2 Type 1 Type 2 Type 1 Type 2 Winter AQI PM 2.5 (µg/m 3 ) PM 10 (µg/m 3 ) O 3 (µg/m 3 ) NO 2 (µg/m 3 ) Summer AQI PM 2.5 (µg/m 3 ) PM 10 (µg/m 3 ) O 3 (µg/m 3 ) NO 2 (µg/m 3 ) AQI 4 3 (b) Summer Observation Type 1 Type 2 (a) Winter Time series Fig. 2: Predicted and observational AQI for the winter half year (a) and the summer half year (b) PM 2.5 concentration (μg/m 3 ) (a) Winter (b) Summer Time series Fig. 3: Same as Fig.2, but for PM 2.5 Observation Type 1 Type 2 the correlation coefficients between the observed curve and prediction curves are 0.25 and 0.61 for the type 1 and type 2, respectively; in the summer half year, the corresponding correlation coefficients are separately 0.29 and For PM 2.5, the corresponding correlation coefficients are 0.28 and 0.64 in the winter, and 0.19 and 0.59 in the summer. Type 2 has a better correlation with the real observations. In addition, the trends of the three curves in each panels are quite consistent, but the predicting of extreme cases still exists great WJMS for subscription: info@wjms.org.uk

7 18 S. Fu & et al.: Prediction of air pollution in Changchun based on OSR method difference. Anyhow, it is not to deny that adding the previous concentration makes the forecast more stable and more accurate. 5 Conclusions Based on the daily AQI and pollutant concentrations data, as well as the simultaneous ground meteorological elements from January 2014 to December 2015, we have studied the current situation of air pollution in Changchun, China. Using 8 meteorological elements and previous pollutant concentrations as predictors, we build prediction models for AQI and pollutant concentration with OSR method. The winter and summer half year are considered separately. Results show that, previous concentration is an important predictor for the following forecast, besides meteorological elements. Adding the previous concentration makes the forecast more stable and more accurate. The trends of AQI and pollutant concentrations are well predicted, but the predicting of extreme cases still exist great difference. References [1] J. Ben. The correlation analysis between the air pollution status and meteorological conditions in Changchun. Ph.D. Thesis, Jilin University, Changchun, [2] M. A. Cohen, S. D. Adar, R. W. Allen, E. Avol, C. L. Curl, T. Gould, D. Hardie, A. Ho, P. Kinney, T. V. Larson. Approach to estimating participant pollutant exposures in the multi-ethnic study of atherosclerosis and air pollution (mesa air). Environmental Science & Technology, 9, 43(13): [3] X. Meng, Y. U. Yu, et al. Preliminary study of the dense fog and haze events formation over beijing-tianjin-andhebei region in january of Environmental Science & Technology, [4] Y. Meng. An analysis of air pollution and weather conditions during heavy-fog days in beijing area. Meteorological Monthly, 0. [5] Q. Zhou, S. Zhang, W. Chen. Pollution characteristics and sources of so 2, o 3 and no x in changchun. Research of Environmental Sciences, 2014, 27(7): [6] F. Shu. Forecasting air pollution based on the key meteorological elements and typical weather patterns in guangzhou. Environmental Chemistry, 2012, 31(8): [7] W. Song. Research on the development of tourism market in Changchun. Ph.D. Thesis, Northeast Normal University, Haerbin, 8. [8] F. Wei. Statistical diagnosis and prediction technology of modern climate (Second Edition). Beijing: China Meteorological Press, 7. [9] Y. Yang, G. Tang, et al. Effects of local circulation on atmospheric pollutants in beijing-tianjin-hebei region during summer. Chinese Journal of Environmental Engineering, 2015, 9(5): [10] Z. Q. yun, W. Zhang, W. S. gong. A study on air pollution, visibility and general circulation feature. Plateau Meteorology, 3. [11] J. Zhang, K. Li, et al. Meteorological element analysis of four severe pollution processes in beijing-tianjin-hebei region. Meteorological & Environmental Sciences, [12] W. Zhang, F. You, et al. Meteorological characteristics analysis of severe haze weather processes in beijing in january Meteorological and Environmental Sciences, 2016, 39(2): [13] Y. Zhang, X. Li, et al. Analysis of pm2.5 pollution process and weather situation in beijing in february Meteorological and Environmental Sciences, 2016, 39(2): [14] J. I. Zhong-Ping, S. B. Luo, et al. Variation characteristics and prediction of air pollution in guangzhou. Journal of Tropical Meteorology, 6, 22(6): [15] L. Zhou, X. Xu. The correlation factors and pollution forecast model for pm2.5 concentration in beijing area. Acta Meteorologica Sinica, 3. WJMS for contribution: submit@wjms.org.uk