Air Pollution Data. by Joanna Rodriguez

Size: px

Start display at page:

Download "Air Pollution Data. by Joanna Rodriguez"

Charla York
5 years ago
Views:

1 Air Pollution Data by Joanna Rodriguez 1

2 When local authorities release warnings or alerts on pollution levels, media highlights the inefficient local policies on this regard or how many times it has happened recently. One will surely remember the fact that one s car was diverted and could not enter a certain area, but after a few days, everyone will forget about the pollution levels attained, the pollutants responsible for it and where were the concentration areas. See for example Beijing, an average day in the city, whose air quality index (AQI) is about 160/500, which is not good and rated as unhealthy. In 2015, authorities banned 2.5 million cars for two weeks prior to the 70th anniversary of Japan s defeat in WWII. By the day of the celebrations, the AQI in the city of Beijing had dramatically improved, dropping to a whopping 17/500. The day after the celebrations, the AQI skyrocketed back to 160: it only took one day to have back its day-today unhealthy environment. This is an excellent example of how somber the situation is, yet how easily it can be changed and transformed into healthy conditions. to different measuring protocols for 25 years. In this post, we display the air pollution maps of a number of chemical components in Europe with a particular focus on Spain. The overall process to get the data ready for plotting is also described since some decisions were made to ease visualization. Achieving air quality levels that do not harm human health and the environment is one of the European Union s long-term objective. This objective is aimed through legislation, cooperation with sectors responsible for air pollution, as well as international, national and regional authorities and non-governmental organisations; and research. In KEEDIO, we thought that displaying the data available in this regard might be quite informative to the extent it was worth a blog post and it happens that the data was available out there in the Air Pollution Database of the European Environment Agency. No surprise is that the data had to go through an intense cleansing cicle since they came from different national agencies around Europe that had kept 2

Data description Let us summarize the most relevant characteristics of the data set: 197 components that measure/monitor the air quality in 38 european countries.

3 Data description Let us summarize the most relevant characteristics of the data set: 197 components that measure/monitor the air quality in 38 european countries. Sample periods: hour, hour8, week, 2week, 4week, day, dymax, month, 2month, 3month, year. Time period: Important: the name, latitud and longitud of each station is available, they are unique and distinguishable. Directory Hierarchy in each country dataset: <country>/<stations-statistics-measurement configuration>/rawdata. rawdata: text files, each one representing 1 time series of A results framework is designed for achieving results in a reproducible way. That is, the steps we will follow to obtain a usable data set. The framework is as follows, 1. Information-crossing process to select: Countries. Component(s). Sample Period(s). 2. Extract the rawdata for the previous selection, per country. 3. For each component and sample period, find the best time interval overlap, that is, select the data with the most observations. 4. Chop the data based on the time intervals previously selected and give the option to fill the missing time series data. 5. Save the obtained data in the next format: <CT>_ ts _<SP>_C<X>.csv <CT> = country initials. <SP> = sample period. <X> = component ID number. 3

4 Getting the data ready for visualization Let s see the selected information after crossing the available information. In order to be able to represent a relevant region that collects around 60% of the air pollution data, we selected the following countries: Germany, Italy, France, Spain, Poland, Belgium, Czech Republic, Netherlands, Portugal, Switzerland. (See Table 1) country_name %data #components GERMANY SPAIN ITALY FRANCE AUSTRIA UNITED KINGDOM BELGIUM CZECH REPUBLIC NETHERLANDS POLAND SWITZERLAND PORTUGAL ROMANIA GREECE FINLAND Table 1 The selected components gather around 72% of the air pollution data, these are: Sulphur dioxide, Particulate matter < 10 µm, Nitrogen dioxide, Benzene, Carbon monoxide and Ozone. (See Table 2) Let s provide a brief description of their health consequences, 4 Ozone (O33). At ground level is one of the major constituents of photochemical smog. It irritates the airways of the lungs, increasing the symptoms of those suffering from asthma and lung diseases. Sulphur dioxide (SO22). Its emissions cause acid rain and generate fine dust. This dust is dangerous for human health, causing respiratory and cardiovascular diseases and reducing life expectancy in the EU by up to two years. Nitrogen dioxide (NO22). It can irritate the lungs and lower resistance to respiratory infections. Particulate matter < 10 µm (PM1010). It affects more people than any other pollutant. The fine particles can be carried deep into the lungs where they can cause inflammation and a worsening of the condition of people

5 with heart and lung diseases. Carbon monoxide (CO). High levels of carbon monoxide are poisonous to human. This gas prevents the normal transport of oxygen by the blood. This can lead to a significant reduction in the supply of oxygen to the heart, particularly in people suffering from heart disease. Benzene (C66H66). Possible chronic health effects include cancer, central nervous system disorders, liver and kidney damage, reproductive disorders, and birth defects. component_name %data Ozone (air) 19.9 Sulphur dioxide (air) 16.9 Nitrogen dioxide (air) 16.5 Particulate matter < 10 µm (aerosol) 9.3 Nitrogen oxides (air) 8.4 Carbon monoxide (air) 8.2 Benzene (air) 1.4 Particulate matter < 2.5 µm (aerosol) 1.0 Table 2 The selected sample periods are: hour and day, since they are the common sample periods for all the selected previous information. In order to proceed with the following steps, which are extracting the rawdata based on the previous selection, find the best interval overlap, chop the data based on the intervals, we direct the reader to the AirBase_CT_data.R script. But first, we make reference to the CTT_FILES.R file that it is at the beginning of the previously mentioned script. In the CTT_FILES.R file we define the global constants that are to be used in the AirBase_CT_data.R script. These are: 1. Directory paths: root path and the save the files path. 5

2. Selected information variables: countries and common components. 3. Time interval to chop the datasets. Let s continue with the next step from the results framework.

6 2. Selected information variables: countries and common components. 3. Time interval to chop the datasets. Let s continue with the next step from the results framework. To extract the rawdata for the previous selection per country, a function called function_extract_ct_data.r extracts a country s rawdata based on the selected components and sample periods. This function is incorporated in the AirBase_CT_data.R script. Next, for each component and sample period, we find the best time interval overlap, that is, we select the data with the most observations, so a representative sample is obtained. To obtain the best overlap that is common to all the components per sample period for all the countries, we do a visual analysis with the dygraph package (see app_ BE.R, app_fr.r, ) and select the timestamps that accomodate our needs. See Figure 1 for an example from the Spain data set, we selected Sulphur Dioxide component measured hourly (shaded area = selected data). To get information from the figure, you can zoom in by selecting a region and zoom out by double clicking on the figure. On the upper right corner, you can see the date, time and number of observations at that time point. Figure 1 6

7 After performing the analysis for all the countries, selected components and sample periods, the selected time interval can be observed in Table 3. Component Hour Day Sulphur dioxide 0:00, 01/01/ /01/2004 Particulate matter < 10 µm 01:00, 01/01/ /01/2005 Nitrogen dioxide 0:00, 01/01/ /01/2003 Benzene 0:00, 01/01/ /01/2005 Carbon monoxide 0:00, 01/01/ /01/2002 Ozone 0:00, 01/01/ /01/2003 Table 3 Then, with the previous information, we chop/slice/trim the extracted datasets. A function called function_ chop_fill_data.r does this and fills NA values in the data (or not, the option is given). As before, this function is incorporated in the AirBase_CT_data.R script. The structure of the data is shown in Table 4. 7

8 Date ES0007R ES0008R ES0009R ES0010R ES0011R :00: :00: :00: :00: :00: :00: :00: :00: :00: :00: Table :00: Recall that, in this post, we aim to display the air pollution map of Spain of the selected components: Sulphur dioxide, Particulate matter < 10 µm, Nitrogen dioxide, Benzene, Carbon monoxide and Ozone. In order to be able to visualize the data, we need to finish prepping it, so we move forward with the last step which is saving the data in a usable and readable format. We have two output formats for the extracted rawdata per country: 1. In R data format ->.RData, with nomenclature: ES_data.Rdata 2. In a table structured format ->.csv, with nomenclature: ES_ts_hour_C1.csv, where C1 = Sulphur dioxide. C2 = Particulate matter < 10 µm. C3 = Nitrogen dioxide. C4 = Benzene. C5 = Carbon monoxide. C6 = Ozone. 8

9 The data has been fully prepped and saved, and we can now visualize it! VISUALIZE SPAIN AIR POLLUTION DATA In the following interactive map, we can observe how and where the stations are spread out throughout Spain for an hourly sample period. A snapchat of a specific time period, which is :00:00, is given. In adition, each layer is a different component which you can select and overlap with others, or not. For each component, an automatic clustering is provided, so we can see how the stations are grouped: this can be helpfull to identify regions or a specific information group. To obtain a little bit more infomation on the data, you can zoom in and click on a station dot in order to display its measurement for time period provided. We encourage the reader to interact with the map and extract conclusions! All the.r files can be found at com/keedio/air-pollution-data 9

10 keedio Calle Virgilio 25 Edificio Ayessa I, Bajo D Pozuelo de Alarcón Madrid