OUTLINE 8/30/2017 WHAT IS BIG DATA WHAT IS BIG DATA

Size: px
Start display at page:

Download "OUTLINE 8/30/2017 WHAT IS BIG DATA WHAT IS BIG DATA"

Transcription

1 8/0/20 Jabatan Perangkaan MALAYSIA 28-1 August 20 Bangkok, Thailand OUTLINE WHAT IS BIG DATA WHAT IS BIG DATA ANALYTICS NSO S : BIG DATA ANALYTICS ARCHITECTURE INITIATIVES EXPECTED OUTPUT PREDICTIVE ANALYTICS DEPLOYMEN T CHALLENGES & WAY FORWARD 9 1

2 8/0/20 WHAT IS BIG DATA? Huge volume of Data The 7 V s of Big Data Visualization Complexity of data types and structures Velocity Variability Volume Speed of new data creation and growth Veracity Value Variety Source :EMC Education Services Source : MDeC Big data refers to the rising flood of digital data from many sources, including the web, biological and industrial sensors, video, and social network communication by Steve Lohr, New York Times Technology Journalist. WHAT IS BIG DATA ANALYTICS? Big Data Analytics is the process of examining large amounts of data to uncover hidden patterns, correlations and other insights. Cost reduction Faster, better decision making New products ser vices and Source : Big Data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical architectures and analytics to enable insights that unlock new sources of business value. McKinsey & Co.; Big Data: The Next Frontier for Innovation, Competition, and Productivity 4 4 2

3 8/0/20 NSO S: BIG DATA ANALYTICS Traffic And Transport Statistics Traffic Loop Detection Records Determination of the sentiment in social media messages revealed a very interesting potential use of this data source for statistics. The sentiment in social media messages may be highly correlated with consumer confidence; in particular towards the economic situation. Making full use of this information would result in speedier and more robust statistics on traffic and more detailed information of the traffic of large vehicles which are indicative of changes in economic development Social Media Statistics Public Social Media Messages Netherlands: Road sensor data for traffic intensities statistics Social media data for consumer confidence Scanner and web scraping data for price statistics NSO S: BIG DATA ANALYTICS Australia: Scanner data for Consumer Price Index (CPI) Switzerland: Scanner data for price statistics Prices Collected On The Internet Price Statistics Assists the Consumer Price Index (CPI) compilers in the automated Internet collection of prices Canada: Web scraping data for Computer and Peripherals Price Indexes and Commercial Software Price Index Belgium: Scanner data for consumer prices Using mobile positioning data for Tourism Statistics Tourism Statistics Data Files Held by Mobile Operators 6

4 8/0/20 NSO S: BIG DATA ANALYTICS ICT usage Statistics Internet Traffic Flows Assessing the feasibility of 'user-centric' and 'web-centric' measurement approaches under a multi-dimensional perspective which include technical, methodological, cost, legal and socio-political issues Singapore: Administrative data for Population Estimates United States: Physical activity data for health statistics Medical claims data for health care pending statistics 7 StatsBD A 8 Aug 2016: Advertise Tender JOURNEY. Sept 2016: Tender Closed 1 Oct 2016: Benchmarking Assessment Session 2 Dis 2016: Kick-off 1 Aug 2016: Tender Briefing 20-2 Sept 2016: Tender Evaluation Meeting Nov 2016: Tender Awarded Project duration: Nov 2016 May

5 8/0/20 ARCHITECTURE DOSM BIG DATA ANALYTICS Integrated Big Data Analytics Environment Big Data Platform Advanced Analytics Data Management Visualization Unstructured Data Analytics 7 10

6 8/0/20 StatsBD ARCHITECTU A RE 8 11 INITIATIVE S 6

7 8/0/20 INITIATIVES Trade by Enterprise Characteristics (TEC) Real Time Business Status Stats Price Intelligence (PI) Main Project Support Project Opinion Mining on Official Statistics Real Time News on Official Statistics 11 1 Trade by Enterprise Characteristics (TEC) 7

8 8/0/20 INITIATIVE 1: INTERNAL PORTAL FOR TRADE BY ENTERPRISE CHARACTERISTICS (TEC) The initiative is to produce an insights trade statistics. The integration able to identify the enterprises that are engaged in international markets as well as to describe their characteristics. To initiate TEC database, DOSM has to attribute trade flows to enterprises with different characteristics by merging data on international trade from Royal Malaysian Custom Department with statistical business register information on enterprises at the individual enterprises level. TRAD E DATABAS E Business Registration Number Business Name Business Address Product Exported/ Imported Partner Country Exports/Imports Value Exports/Imports Volume Etc. TRADE BY ENTERPRISE CHARACTERISTI CS (TEC ) MSBR Business Registration Number Business Name Business Address Industry Incorporated Date Operating Status Number of Employees SME Status etc. TRADE BY ENTERPRISE CHARACTERISTICS DATABASE Business Registration No. Business Name Business Address Product Exported/ Imported Partner Country Exports/Imports Value 1 Exports/Imports Volume Industry Incorporated Date Operating Status Number of Employees SME Status etc. 16 8

9 8/0/20 EXPECTED OUTPUT 1 Forecasting trade transaction volume Forecasting of monthly trade based on trade activity and value PREDICTIV ANALYTIC E S 2 Company survival model Product To predict survival 4 when the next product endof-life will occur What will be the total number of traders in the next unit of time (i.e. day)? Concentratio n risk The correlation of economic activities with the number of traders Market basket analysis 1 Forecastin g number of traders by sector 6 Prediction of how likely a company will survive based on various predictors such as company business size, company address (region), etc. Which item groups tend to be traded together? 18 9

10 8/0/20 Price Intelligence (PI) INITIATIVE 2: INTERNAL PORTAL FOR PRICE INTELLIGENCE (PI) Modernization of data collection tools for improving quality of Consumer Price Index (CPI). The modernization of data collection mainly consist of the adoption of web scraping techniques to scrape price data from related website for CPI compilation

11 8/0/20 PRIC E INTELLIGENC(PI) E Analysis Data Mining Data Visualization Report & Dashboards Big Data Alert Price and changes comparison between online and existing data Most expensive/cheape st price by item, by locations Comparison Malaysia CPI with other countries EXPECTE OUTPU D T Average Price by Malaysia, State, Strata, *PCC; by year, month [Split between scrape data and existing data] Items by type of Outlet Ranking of price changes by monthly, by the importance of the items *PCC: Price Collection Centre Index and Price Changes by Malaysia, State, Strata, *PCC; by year, month; by various group(coicop) levels Top retailers/items purchased online Highlight specific items/ specific price changes by locations, by month Index comparison by state, group level Price changes relationship with weather and exchange rate

12 8/0/20 PREDICTIV E Weather & Exchange Rate Effect on CPI Duration of Price & Spesification Changes Price Intelligence Simulation & Forecasting CPI by Group/State /Location ANALYTIC S Market Basket Analysis Forecasting Price Change Effect on the Indices 21 2 Public Maturity Assessment on Official Statistics 12

13 8/0/20 INITIATIVE : PUBLIC MATURITY ASSESSMENT OFFICIAL STATISTICS ON The analysis and assessment of the degree of happiness of Malaysia community with regards to official statistics published by DOSM. The data is obtained from online social media OPINION MINING

14 8/0/20 DASHBOARD Price Intelligence Opinion Mining Trade by Enterprise Characteristics Real Time news on Of ficial Statistics MOBILE APPS 14

15 8/0/20 EXPECTE D OUTPUT 0 29 Print Media (Newspapers/ Magazines/Books) SOURCE S Internet (Blogs/Facebook/Twitter) Broadcast Media (Television/ Radio) Television News (BBC/CNN/NBC) Radio News e-services DOSM (publication) Talk Radio 1 0 1

16 8/0/20 Initiatives DEPLOYMEN T ) Opinion Mining: Sentiment Analysis of Official Statistics (SAOS) 2) Real Time News on Official Statistics (RTOS) ) BizCode@Stats Mobile Apps 4) Internal portal Price Intelligence (PI) ) Real Time Business Status System (RTBS) 6) Trade by Enterprise Characteristics (TEC) **Using agile approaches: help teams respond to unpredictability through incremental, iterative work cadences, known as sprints. 1 CHALLENGES Suitable statistical and IT methods New tools and skills are needed to handle alternative data Quality issues on each dataset and application The costs of sourcing the data must in balance with the benefits Legislative requirement for getting access and using the data Legal issues (e.g. personal data protection) 2 16

17 8/0/20 WAY FORWARD MALAYSIA INTEGRATED POPULATION CENSUS SYSTEM (MyIPCS) Stats BDA Data Management & Analytic MSPR Communication Statistics Operation + MSAR Online SelfListing e-census GIS HH Module NEWSS Existing GIS Preparatory Work Interactive database Enumeration Data Processing and Databases Dissemination Evaluation Principles and Recommendations for Population and Housing Censuses, Revision 1.Specify Needs 2. Design. Build 4. Collect. Process 6. Analyse 7. Disseminate 8. Evaluate Generic Statistical Business Process Model (GSBPM) WAY FORWARD MALAYSIA INTEGRATED POPULATION CENSUS SYSTEM (MyIPCS) 4

18 8/0/20 THANK YOU... StatsMalaysia 18