Real Applications of Big Data. Challenges and Opportunities

Size: px
Start display at page:

Download "Real Applications of Big Data. Challenges and Opportunities"

Transcription

1 Real Applications of Big Data. Challenges and Opportunities David Gil University of Alicante Polytechnic School - EPSA Department of Computer Technology

2 Introduction to Big Data Projects Benchmarking (weka, R, MLib) Open data Conclusions & challenges

3 Introduction to Big Data Data mining Artificial Intelligence / Machine Learning Smart Cities / Sensors / Internet of Things

4 3 Vs - Terabytes - Records, tables, files - Batch - Real time - Streams 3 Vs of Big Data Introduction to Big Data Volume - Sources: - Structured - Unstructured - Semi Velocity Variety

5 3 Vs - Terabytes - Records, tables, files - Batch - Real time - Streams 3 Vs of Big Data Introduction to Big Data Volume - Sources: - Structured - Unstructured - Semi 5 Vs - Terabytes - Records, tables, files - Batch - Real time - Streams 3 Vs of Big Data + Veracity Value Volume - Sources: - Structured - Unstructured - Semi Velocity Variety Velocity Variety

6 Introduction to Big Data

7 Introduction to Big Data

8 Introduction to Big Data

9 Big data Methodology Information collected: Various sources, public, private, different types, web, etc. Projects

10 Big data Methodology Information collected: Various sources, public, private, different types, web, etc. Analysis - Preprocessing: To know the data, lean, cleaning and data preparation Projects

11 Big data Methodology Information collected: Various sources, public, private, different types, web, etc. Analysis - Preprocessing: To know the data, lean, cleaning and data preparation Projects Processing: Applying AI techniques for grouping, classify, predict, etc.

12 Big data Methodology Information collected: Various sources, public, private, different types, web, etc. Analysis - Preprocessing: To know the data, lean, cleaning and data preparation Projects Processing: Applying AI techniques for grouping, classify, predict, etc. Reports: Analytics, broadcasting, dashboards, visualization

13 Big Data Architecture. New Approach based on Data Mining and Machine Learning Project: Analysis in on line courses (MOOC courses) Motivation: Many courses, why students choose any of them? Globalization for education Goal: Find out the most suitable course for everyone Feedback for instructors / organizers

14 Big Data Architecture. New Approach based on Data Mining and Machine Learning UNIMOOC : MOOC of UA students

15 Big Data Architecture. New Approach based on Data Mining and Machine Learning

16 Big Data Architecture. New Approach based on Data Mining and Machine Learning

17 Big Data Architecture. New Approach based on Data Future lines Mining and Machine Learning More analysis during the course (so far, only analysed the enrolment) Predict courses dropouts Transferring the knowledge to the traditional universities

18 Management requirements and methodology for Big Data analytics Project: Time prediction in software development Motivation: Avoid delays in software development Better use of data collected along the sw development Goal: prediction of delay in a software product: Base on all data collected + logs (developer teams) / lessons learned for future developers

19 Management requirements and methodology for Big Data analytics First step - collecting data & preprocessing: 2 millions of registers Initial visualization of loaded data

20 Management requirements and methodology for Big Data analytics

21 Management requirements and methodology for Big Data analytics Next step : Processing / ML algorithms Traditional algorithms fail New approaches: Spark / Hadoop Dataset split into x partitions To manage Volume Updateable algorithms

22 Management requirements and methodology for Big Data analytics Conclusions / future lines Application of Data Mining in Big Data Improve data integration, include Open Data (Variety) Improve the dashboards Visualization (probably along the sw development procedure) Most of these future lines are for possible new projects

23 Benchmarking (weka, R, MLib) Benchmark. Ongoing Project - Different tools / advantages Weka R MLib

24 Weka Benchmarking (weka, R, MLib)

25 Weka Benchmarking (weka, R, MLib) Most of the basic ML algorithms are implemented

26 Weka Benchmarking (weka, R, MLib)

27 R Benchmarking (weka, R, MLib)

28 R Benchmarking (weka, R, MLib)

29 Spark Mlib Benchmarking (weka, R, MLib)

30 Open data

31 Open data 31

32 Open data Examples of Big Data use for social goods

33 Open data

34 Open data

35 Open data

36 Conclusions & challenges Big Data involve many other areas It is not only Volume Variety is the most challenge V We need data / Open data / plans for data

37 Architecture Visualization Conclusions & challenges Big data Ontology AI & ML Tools Text mining Goal: Get Value from data make better predictions

38 Real Applications of Big Data. Challenges and Opportunities David Gil Thank you!!! and Questions? Contact: