Big Data in Emergency Informatics Social media data perspective. Rajendra Akerkar

Size: px
Start display at page:

Download "Big Data in Emergency Informatics Social media data perspective. Rajendra Akerkar"

Transcription

1 Big Data in Emergency Informatics Social media data perspective Rajendra Akerkar

2 Outline What is Emergency Informatics? Big data and applications in emergency response Social media data Issues and challenges Emergency management in social media generation

3 Emergency Informatics The study of the use of ICT in the preparation, mitigation, response and recovery phases of disasters and other emergencies. Crisis informatics has links to a number of activity areas around crisis management. Preparedness (training, baseline information gathering, simulations, and conflict prevention), Response (coordination, information gathering, and provision of humanitarian relief or aid) and Recovery (resource allocation, population monitoring, development)

4 Big Data is a process that facilitates the decision making, through a swift analysis of large amounts of data, of different types, from a variety of sources, to produce a stream of usable knowledge The volume refers to the quantity of data generated continuously per time unit. The variety refers to the uncontrollable diversity of data types (videos, pictures, numbers,...) and data format (structure, unstructured,...) from both known and unknown sources The velocity refers to the speed with which the data is produced and the speed needed to process the data in time The veracity refers to the trustfulness, the objectivity, the authenticity, and the security surrounding data. Deriving Value: Evolving Background Knowledge, Actionable Intelligence and Decision Making

5 Semantics for big data Volume by enabling abstraction to achieve semantic scalability (for decision making), Variety by overcoming syntactic and semantic heterogeneity to achieve semantic integration and interoperability, Velocity by enabling ranking to achieve semantic filtering and focus, Veracity by cross checking multimodal data with semantic constraints, and Value by enriching semantic models to make them more expressive and comprehensive.

6 Big data features Digitally generated the data are created digitally (as opposed to being digitised manually), and can be stored using a series of ones and zeros, and thus can be manipulated by computers. Passively produced a byproduct of our daily lives or interaction with digital services Automatically collected there is a system in place that extracts and stores the relevant data as it is generated Geographically or temporally trackable e.g. mobile phone location data or call duration time. Continuously analysed information is relevant to human well being and development and can be analysed in real time

7 Big data application areas in emergency informatics Each of these application areas demonstrates two key issues. First, ordinary citizens are key stakeholders in this process. as seeded volunteers who are organized before crisis events or crowd sourced volunteers spontaneously emerging during events. Second, realizing the potential applications of big data in crisis management requires participation from crisis informatics stakeholders Such as first responders, emergency managers, authorities, and humanitarian organizations, members of the public and industry.

8 Types of data associated with emergency cycle Akerkar, R. Processing Big Data for Emergency Management. (Ed.) Liu, Zhi, and Kaoru Ota. "Smart Technologies for Emergency Response and Disaster Management." (2018), accessed April 17, doi: /

9 Social media and emergency informatics Emergency informatics is in the early stages of integrating big data. The key improvement is that the analysis of this data improves situational awareness more quickly after an event has occurred. This can save lives, reduce resource expenditure and aid decision making. Stakeholders in this area are making progress in addressing privacy and data protection issues. There is evidence of a reliance on US cloud and computing services.

10 Dilemma What computer scientists can provide? Any available prior knowledge about the impact of similar past disasters in the region? Are existing response strategies sufficient? Which factors will worsen conditions? How many fatalities? Extent of damage? What emergencyresponders want? Algorithms to detect and predict abnormal trends Semantic abstraction and summarization of data Human+Machine readable knowledge organization via ontologies Technology to map geolocated information Visual data interface for quicker comprehension Real time updates on the situation Textual summaries, images, videos Messages about needs and offers Geo location metadata What is supported by social media data?

11 Emergency response analytics Mainly three major methods of information extraction and mapping: Manual feed(processed info.) based e.g., Most of the formal and hybrid response organizations (Red Cross, UNOCHA), Recovers.org, AIDMatrix, SparkRelief, etc. Crowdsourcing with limited automation e.g., Crowdmap/Ushahidi, etc. Automatized processing based e.g., EmerGent, Twitris, CrisisTracker, etc. Information management for resource coordination: e.g., Sahana

12 Key issues and challenges

13 Heterogeneity Multiple channels Phone, fax, TV, radio, newspapers, internet, sensor networks, etc. Social media is heterogeneous Verified accounts Re tweets from well known sources Eyewitness reports Different types (unstructured text, structured, multimedia) may require different tools

14 Velocity Social media information is more valuable in early hours after a disaster Affected people are there before anybody else When emergency responders arrive, their priority may not be to keep information flowing After hours/days social media is still valuable, but there is much more information from other sources

15 Scale In some countries a sizable fraction of the population has Internet access Tweets are small and quick but they point to webpages, include images, videos, etc. You need to process a lot to obtain a little There are many tweets but Only some of them contain usable information Only a fraction of those can be handled by automatic systems

16 Redundancy Information from multiple information channels may not be unique Near duplicates frustrate users and waste their time Automatic systems tend to pick what is redundant first Not necessarily a bad thing e.g. phrases that are often repeated, tweets that are often retweeted, etc.

17 Noise Everyone wants to be heard Independently of adding any value Emotional expressions and even jokes drive the data traffic Informal text and jargon hinders automatic text processing

18 Verifiability Social media users are starting to develop their own techniques to validate information In crisis scenarios most rumors are spread by wellintentioned people We need a more fine grained approach than true/false

19 Emergency management in social media generation

20 Emergency Management Cycle The following important issues have to be addressed in order to strengthen the tie between citizens and ES on Social Media: 1. ES must be enabled to access the right channels on different Social Networks. 2. ES must be able to apply filters on the data to identify relevant content, e.g. information about incidents or emergency related discussions. 3. ES have to find the relevant information as quickly as possible. 4. ES must be able to access relevant users or communities in order to be able to distribute information to interest groups An IT System with innovative methods for data & information mining (DIM) and additionally a component for assessing the quality of the gathered data and information (DIQ)

21 Today EmerGent Tomorrow Stakeholders Methods Tools Social Media weak connection EMC Current Use Potentials & Requirements Continous Involvement & Workshops Evaluation/ DATA Dissemination Campaign strong connection?! O1 O4 Citizens ES Citizens ES Social Apps weak connection weak connection ES IT Systems O2 O3 O5 Social Apps Social Media strong connection?! EmerGent IT-System (API) EMC strong connection?! ES IT Systems Tools Methods Stakeholders Scenarios

22 EmerGent Architecture

23 Workflows We identified three workflows: 1. C2A Indirect 2. C2A Direct 3. A2C These workflows are useful to proof the architecture

24 Workflows in details C2A indirect: EmerGent produces alerts for ES after analysing messages published by citizen on SM C2A direct: EmerGent links citizens and ES using the APP A2C (direct): EmerGent allows ES to communicate with citizens through the APP and/or social media.

25 EmerGent at a glance Collect and store posts Quality rating Mining Grupping Alerts Notifications

26 User Interface ES

27 Processing and Analysis Subsystem

28 Research in IM Challenges A lot of irrelevant information Slang, abbreviations, spelling/grammar errors Duplicate information... And many more! A number of mining algorithms are under development Clustering Filtering Natural Language Processing Evaluate the best combination of algorithms for specific circumstances

29 Research in IQ: Approach Analyse existing IQ models For social media (SM) (target group: citizen) For emergency management (EM) (target group: ES) For SM in EM (target group: both) Identify and define criteria and their indicators for the measurement of information quality in social media in emergencies Objective Framework for the quality assessment of social media data before, during and after emergencies

30 Research in Information Modelling What we want {Set of posts} > Actor1 reports about thing Actor1 isa helper Two missed children Critical infrastructure affected Important subject areas to consider Resources, People, Organisations, Damage, Disasters, Infrastructure Main Outcome: Combination of the best + EmerGent

31 The EmerGent Ontology To model data from Social Media in order to understand the structure with all features, properties and constraints in Social Media To describe ES domain related information based on existing solutions and gathered requirements and specified scenarios Build a mapping between domain related information and information from Social Media in order to build the EmerGent ontology

32 Approach to build ontology

33 Thank you!