Smart Cities and Big Data

Size: px
Start display at page:

Download "Smart Cities and Big Data"

Transcription

1 SUSTAINABLE SYSTEMS Smart Cities and Big Data Based in part on material from Professor Pang-Ning Tan Dept of Computer Science & Engineering Michigan State University Website: Smart Cities and Big Data-1

2 Google Trends ( Big Data trend: Jan.2004 Aug.2018 Smart Cities and Big Data-2

3 Big Data: How Much Data is Out There? Source: Smart Cities and Big Data-3

4 Smart Cities trend: Jan.2004 Apr.2018 Google Trends ( Smart Cities and Big Data-4

5 Big Data for Smart Cities Smart cities, founded on the use of information and communication technologies, aim at tackling many local problems, from local economy and transportation to quality of life and e-governance. Martínez-Ballesté et al. IEEE Communications Smart Cities and Big Data-5

6 Pros Should cities bother to collect big data? Can lead to better management of resources, services, etc. Can lead to better predictions of patterns of use, of trouble, of resource demand. Cons Hard to identify key data to collect. Expensive to measure in the field. Requires expertise to build such systems. Smart Cities and Big Data-6

7 Quality of the data Stamp's Statistical Probability The government [is] extremely fond of amassing great quantities of statistics. These are raised to thenth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases. (Attributed to Sir Josiah Stamp, , H.M. collector of inland revenue. Smart Cities and Big Data-7

8 How can big data lead to better performance? Command and Control approach Measure current system behavior Compare to target (desired) behavior Issue commands to meet targets C&C diagram Feedback link is critical Smart Cities and Big Data-8

9 C&C diagram: general targets Compare and act inputs Actual system outputs measured data Sensors actual data Smart Cities and Big Data-9

10 Example: vehicle speed control target speed Compare and act gas pedal Actual vehicle actual speed measured speed Sensors (tach) Whether a driver or an automated speed control system is in charge, the same structure works. Smart Cities and Big Data-10

11 target flows Example: traffic management in an urban environment Compare and act set traffic lights Street grid actual flows measured flows Sensors Consider the amount of data involved. Smart Cities and Big Data-11

12 NASA EOSDIS growing source of Big Data NASA s Earth Observing System Data and Information System (EOSDIS)is in the middle of a critical project prototyping, testing, and evaluating a significant change in the way data users access and use NASA Earth Observation (EO) data. Ironically, data users likely will not even notice if this change is implemented. What they will notice is more efficient access to more data and the ability to do more with these data. The change being considered is moving EOSDIS data to the cloud. This move would not only be a logical technical evolution for EOSDIS, but also a proactive effort to provide broader access to a data archive that is expected to grow significantly over the next several years. 09/11/2018 Smart Cities and Big Data-12

13 NASA EOSDIS growing source of Big Data Smart Cities and Big Data-13

14 Some social media sources of Big Data Youtube: 400 hours of videos uploaded every minute ( Facebook: 600 TB of data per day (2014) ( Instagram: 60 million photos posted per day (2014) ( 4Q3.pdf) Twitter: 250 million tweets posted per day (2011) ( Smart Cities and Big Data-14

15 Sensor time series Examples of Big Data for Smart Cities Surveillance video streams GPS trajectories from mobile devices Smart card Social media Structured data Smart Cities and Big Data-15

16 Example: Sustainable Transportation Use big data approach for: Bike station placement prediction Demand forecasting Smart Cities and Big Data-16

17 Bike sharing station placement Previous studies have shown placement should be based on: Area function (high demand near residential areas, transition hubs, and tourist attractions) Human activity (people rent a bike for commuting, shopping, entertainment, and personal errands) Demographics (users tend to be younger, highly educated, less affluent) How to determine the placement locations using big data? Google Places API provide info about businesses and point of interests FourSquare API provide user check-ins to restaurants and other places Data.gov US government open data portal (to obtain demographic data about an area) Smart Cities and Big Data-17

18 Bike sharing demand forecasting Spatio-temporal prediction where and when? Smart Cities and Big Data-18

19 Data flow diagram - 1 instruments COLLECTION PHASE Raw data Initial processing Data storage Data storage ANALYSIS PHASE Analysis for insight Display for insight Smart Cities and Big Data-19

20 The 4 V s of Big Data Volume Variety Veracity Velocity The Value added question: Should we add the capability to handle big data of type X, in view of the cost? Smart Cities and Big Data-20

21 Data flow diagram - 1 VELOCITY instruments Raw data VERACITY, VARIETY Initial processing VOLUME Data storage Data storage VALUE added Analysis for insight VALUE added Display for insight Smart Cities and Big Data-21

22 Initial processing of data Noise Outliers Missing values Overlapping data Varying formats, scales, etc. Smart Cities and Big Data-22

23 Types of processing tasks Anomaly detection Descriptive statistics Clustering Association Prediction Smart Cities and Big Data-23

24 10 Complex Data Mining Tasks Data Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 11 No Married 60K No 12 Yes Divorced 220K No 13 No Single 85K Yes 14 No Married 75K No 15 No Single 90K Yes Ranking/ Recommendation Smart Cities and Big Data-24

25 Predictive Modeling example Object detection for autonomous vehicle driving Smart Cities and Big Data-25

26 Cluster Analysis example Crime hotspot detection Smart Cities and Big Data-26

27 Anomaly Detection Detect significant deviations from normal observations Smart Cities and Big Data-27

28 Anomaly Detection examples Smart Transportation Congestion detection Sensor fault detection Smart Home/Building Water theft detection Pipe burst detection Smart Cities and Big Data-28

29 Big Data Challenge: Privacy Smart Cities and Big Data-29

30 Challenge: Privacy and Security Example: Hacking into database for private purposes Smart Cities and Big Data-30

31 Big Data lab exercises The objective of the lab exercises is to help you get familiar with the main issues in handling big data accumulations. Our tool is Excel. Although Excel is not used in practice for big data problems (too slow), you can see what needs to be done and try to do it. The exercises do not include streaming data (velocity) issues. Smart Cities and Big Data-31

32 Project 1: Daily travel data Case 1: Regular daily pattern of travel Home -> Work -> Gym -> Eat_out-> Home Each week, Mon Fri. Case 2: No gym on Thursdays Case 3: Sometimes eat home on Mondays Case 4: Some outliers (anomalies) in the data Smart Cities and Big Data-32

33 Thanks For Listening! Smart Cities and Big Data-33