Big Data and Official Statistics

Size: px
Start display at page:

Download "Big Data and Official Statistics"

Transcription

1 Big Data and Official Statistics Takao Itou Executive Statistician for International Statistical Affairs Ex-Director-General for Policy Planning (Statistical Standards) Ministry of Internal Affairs and Communications Government of Japan Session : Big Data: Official statistics and the use of non-traditional data sources Joint UNECE/Eurostat/OECD/ESCAP/ADB Meeting on the Management of Statistical Information Systems (MSIS 2014) (Dublin, Ireland, and Manila, Philippines, April 2014)

2 Contents 1. Seminars, Symposiums, etc. Concerning Big Data and Official Statistics in Recent Years 2. Background 3. Definition of Big Data No Common Definition Big Data Age? 4. A Variety of Contents and Approaches 5. Big Data and Official Statistics 8 Approaches 6. Limits of Big Data Social and Economic Data Owned by the Private Sector 7. Possibility of Big Data for Official Statistics

3 1. Seminars, Symposiums, etc. Concerning Big Data and Official Statistics in Recent Years.(Examples) (1) Symposium: Big Data Initiative, Actual Situation of Advancement and Usage of Data Science February 1, 2013, Japanese Society of Applied Statistics (Chair: Ph. Shigeru Kawasaki), Tokyo, Japan (2) Seminar: Big Data for Policy, Development and Official Statistics as a side event at UNSC, February 22, 2013, United Nations, New York (3) Seminar: Quality Management of Statistics and Data -- Quality of Data Collection and Its Management in Big Data Age, April 9, 2013, Japanese Society for Quality Control, Tokyo, Japan / /i h (4) 3 papers including Big Data Overview by ESCAP in the Meeting on the Management of Statistical Information Systems, Paris and Bangkok, April (5) Work of the High-level Group on the Modernization of Statistical Production and Services (June 10-12, 2013, 61st ECE-CES). CES) What does "Big" " data mean for official i statistics? i (UNECE) June 12, 2013, Geneva, Switzerland, (6) Session on Big Data and session on the potential of the Internet, big data and organic data for official statistics at the 59 th ISI World Statistics Congress, August 27, 2013, Hong Kong, China, i2013 hk/ / i tifi t h (7) 11 th Management Seminar for Heads of NSOs in Asia and the Pacific, United Nations University, Tokyo, Japan, November 21 and 22, 2013 (8) Big data and modernization of statistical systems, Report of the Secretary-General Statistical Commission, Forty-fifth session, 4-7 7March h (9) In-depth review of big data which was prepared by the temporary Task Team on Big Data and the Secretariat, Sixty-second plenary session, ECE CES, Paris, 9-11 April /fil i /DAM/ t t /d t / / /2014/7 d th i f data.pdf df

4 2. Background. (1) Development of ICT Development of various observation devices and sensors, enhancement of computer ability (function) for records, storage, retrieval, analysis, graphic function, etc. Development of communication tools (mobile phone, social network, etc.), GIS, navigation system, etc. (2) Reduction of cost for using ICT. (3) Increased recognition of quick and easy collection of large volume data. (4) Increased recognition of usefulness of data analysis in various fields; Academia, observation of nature/natural phenomena, etc.. Decision-making in business for improvement of services, cost reduction, etc.. Policy-making in the government sector, etc.. (5) Increased possibility of searching individual records in social/economic data and their use. The use of Big Data started more than ten years ago, and it has become more familiar to the public in recent years as much information about useful examples of big data analysis has been published.

5 3. Definition of Big Data No Common One Big Data Age? 1. Some people have tried to define the concept of Big Data. Most of them point out that its volume, velocity and variety (diversity) are important factors defining Big Data. However, most Japanese data scientists say there are no common definition of Big Data because it is very difficult to define concrete criteria for volume and velocity other than larger and faster than ever. No criteria for volume. (It is very difficult to define the criteria for volume.) No criteria for velocity (It is very difficult to define the criteria for velocity.) Contents/types of data are diverse. These understanding are being common. 2. Although it is difficult to define Big Data and sometimes it is unnecessary to define it, many examples of Big Data use are published in various fields and using various data sources. 3. Examples of Big Data use are published in following cases. And some of them are using Big Data and GIS. Academia: medical science, life science, environmental science, geology (disaster prevention), meteorology, high-energy physics, etc. Business: POS system, banking system, credit card system, navigation system, mobile phone system, traffic record system for telecommunication, road traffic, etc. fare calculation/payment system for rail road service, bus service, electric service, gas service, water service, broadcasting service, etc. Administration: various administrative records by using ICT (civil registration, taxation, health care service, pension service, data collected by statistical surveys, etc.) 4. I feel that Big Data is a misleading term and Big Data Age is more accurate for better understanding of current situation surrounding us.

6 4. A Variety of Contents and Approaches. 1. Based on the information i I have, Big Data contents can be dividedid d into next categories: Scientific data collected through the observation of nature/natural phenomena, etc. Social/economic data collected by the social/business ICT system. Data restored in administration (including statistics section) by using ICT 2. The contents in each category vary according to the recording system and recording items. The use of these data should be considered by data type, system, and contents. And also following points are very important for traditional NSOs; Who has big data? Can NSOs collect big data form other bodies easily without any problem? Is the big data requested by NSOs for compiling official statistics related to privacy yproblem? Can NSOs really analyze big data and compile official statistics useful for the society from big data? 3. The table below shows the usual case of possession and data type of big data. And social/economic data concerning individuals/companies have much connection with privacy problems. These data looks attractive for NSOs, but I think it is not so easy for NSOs to get and use them for official statistics when thinking privacy problem and limits of big data described below. 4. At the same time, even when different countries have the same system concerning social/economic area, their volumes and contents differ. For example, in NY, many people use credit cards for purchasing day-to-day food and goods in shops, but in Japan, only a few people do, so the results analyzed by using credit card data differ between the two countries. 5. Therefore, the analysis of social/economic data collected by an ICT system should be considered by each NSOs based on the situation of each country.

7 4. A Variety of Contents and Approaches (continued) Possession and Data Type of Big Data (Usual Case) Possession Administration Private Sector NSO Government Government Business Research Administrative Research Company Institute Organization Institute Data Type Scientific data collected through the observation of nature/natural phenomena, etc.. Social/Economic Concerning Individuals/Companies Data? p p p p p. Statistical Data as a provider (Note) P shows the strong connection with the privacy problem.

8 4. A Variety of Contents and Approaches (continued) 1. When we see the discussion and introduction of useful case on Big Data not only in statistical communities but also other areas, there are many approaches to Big Data from various viewpoints as follows. (1) Development of new observation equipment/sensors and possibility of new data, (2) New techniques/software for Big Data analysis, (3) Possibility of business use, (4) Possibility of and problems with business data use for official purposes. (5) Possibility of and problems with use of administrative data for official statistics. (6) Examples of analysis and Big Data use (for sales strategy, public use, academic use, etc.), (7) Possibility of using ICT for official work (how to use it, its effects, etc.) 2. However, sometimes they are considered based on the specific cases or contents of Big Data, which makes the discussion more complicated. 3. Although it is important for NSOs future work to get deep knowledge about above approaches all, hereinafter I want to think mainly about above approaches from (4) to (7) and especially about social/economic Big Data use for official statistical work..

9 5. Big Data and Official Statistics 8 Approaches. 1. Further introducing ICT to official statistical work: e.g. collecting respondent s answers via the Internet or other telecommunication tools, data processing, dissemination of statistical results. 2. Making official i statistics ti ti more accessible and available by using ICT, e.g. GIS and other functions 3. Compiling and releasing official statistics based on administrative records by using ICT. 4. Compiling new official statistics based on administrative records by using ICT. 5. Using Big Data collected and possessed by the private sector as supplementary data for compiling existing official statistics. 6. Compiling existing official statistics by using Big Data collected and possessed by the private sector. 7. Compiling new official statistics by using Big Data collected and possessed by the private sector. 8. Encouraging the private sector to produce and release new statistics useful for the public.

10 6. Limits of Big Data Social and Economic Data Owned by the Private Sector 1. There are two directions for analyzing Big Data: analyzing big data for compiling statistics and searching individual records. The latter is also important, however it may not be the work of NSOs. For example, at the time of Great East Japan Earthquake, four car navigation companies in Japan collaborated and started immediately and voluntary to search roads available for the rescue by using their car navigation data after the earthquake. Off course, these are not statistical information but they are useful information for the public. Concerning this case, there are two problems. One is the privacy or company s secret problem. In this case, private car navigation companies pointed out some problems for providing their big data to others including government. They say it is difficult to provide their big data to others in usual base without solving these problems. It is important for the private sector to keep customers secrets and respect their privacy as well as the government. Hardware security, software security and customers trust are also important. At the same time business secret should be kept. Cost is also necessary for providing data to the administration. Therefore, some people insist on the necessity of legal frameworks for official work to use Big Data. The other one is the problem of statistical analysis and other data mining methods. How do NSOs think that data mining other than providing official statistics is really the work of NSOs? 2. As possession and use of Big Data in private sector vary, and Big Data is usually possessed by more than one company, it is not easy to collect them all without any solution about above problems. And when the global company has the Big Data crossing over countries, it seems difficult to request the global company to provide specific data concerning one country. So, deep knowledge of ICT and case-by-case study is necessary. 3. Big Data has a large volume, but the items are limited and are sometimes fewer than items in statistical surveys. So, it is doubtful that big data analysis is useful for diversified analysis.

11 6. Limits of Big Data Social and Economic Data Owned by the Private Sector 4. Some people point out that Big Data is non-structured and different to statistical survey data. Usually, in official statistics, people try to find information, knowledge, new findings etc. by using census or structured statistical survey data. However, in analysis of Big Data possessed by private sector, people try to find information, knowledge, new findings etc. useful for their business by using them even if Big Data is unstructured. 5. In some seminars/symposiums, people comment that the quality of Big Data in some cases is not so high and not appropriate for statistical/analytical purposes. Sometimes the quality of administrative data is also regarded as questionable. However, administrative data are gathered under some legal framework and it seems usually more structured than big data collected by the ICT system of the private sector. 6. Even though Big Data has a large volume, it doesn't include data on people who don t participate in the ICT society. So, the analysis of Big Data has a risk of misunderstanding of whole feature of people/society. There is much risk that the analysis of Big data only has a distortion.

12 7. Possibility of Big Data for Official Statistics 1. Many NSOs have been trying approaches (1) to (4) in above 5. In Japan, as a matter of ffact, these four approaches are not recognized as a Big Data issue but also recognized as an issue for introducing ICT to our official statistical work. These challenges modernizing statistical system by using ICT are very important for all NSOs to heighten and strengthen the value of official statistics and NSOs authority/possibility. 2. Some NSOs have begun studies on approaches (5) and (6) in above 5. However, many statisticians in Japan feel approach (6) is unrealistic in this stage as the limits above explained are essential and it is difficult to solve them. 3. Concerning approach (7) in above5, I feel that it is possible to produce new statistics i from Big Data which h NSOs have not used for traditional statistics, such as traffic data, mobile data, etc. But, we should consider whether the new statistics are really worth compiling and possible as official statistics. At the same time, we, NSOs should think that data mining other than providing official statistics can make useful information i for the public and it will be really the work of NSOs. And someone pointed the possibility to produce leading index by using economic Big Data. In this case, adding to the data availability, it will be a most important point whether a causal relation/interrelation can be proved between new index and phenomenon covered. 4. Concerning approach (8) in above 5, the private sector s voluntary efforts/cooperation are essential and it should be encouraged by government. In this case, who does cover the cost, are stable and continuing i publishing of statistics ti ti kept, are the metadata t necessary for its assessment published, these are very important problems to be solved.

13 7. Possibility of Big Data for Official Statistics (continued) 5. Big of statistical systems, Report of UNSC Forty-fifth session, is a very useful one to understand the overview of Big Data issue, challenges of NSOs and problems to be tackled by them. In this paper, Big Data source are classified in 6 categories such as sources arising from the administration of a programme, be it governmental or not, commercial or transactional sources arising from the transaction between two entities, sensor network sources, tracking device sources, behavioral data sources, opinion data sources. ECE/CES paper also introduced other type of classification. Off course, these are useful, but I think considering two points described in this paper, possession of big data and its availability, the type of approaches and limits of Big Data and possibility of official statistics would be necessary as a base for more fruitful discussion on Big Data and official statistics. 6. At the same time, legislative, privacy, financial, management, methodological, technological problems to be solved for official statistics are also presented in this paper; These are very important points and useful for our thinking, however, financial, management and technological challenges are common ones for all matters, and when thinking the use of Big Data in NSOs, legislative, privacy matters are more essential for official statistical points. And concerning methodological l matter, it is very important to gather information i about case studies as many as we can and to exchange experience including process collecting data and problems solved each other. In 45 th UNSC hold in last March, the proposal to create a global working group on the use of Big Data for official statistics were supported broadly and it is requested to develop guidelines to classify the various type of Big Data sources and approaches. I think above analysis should be in consideration for the future studies and discussions on Big Data and possibility of its use by NSOs. 7. Finally, some people are so anxious about the competition with the private sector having Big Data. However, I don t think so. I feel Big Data is not so awful and much more corporation of NSOs and the private sector or other entities is important as the use of Big Data for official statistics just started.

14 Thank you very much for your attention! The opinions in this presentation are not the official opinions of the Japanese Government. The responsibility yfor them fully belongs to the author.