Big Data: Evolution, Issues and Tools

Size: px
Start display at page:

Download "Big Data: Evolution, Issues and Tools"

Transcription

1 Big Data: Evolution, Issues and Tools Sabia Department of Computer Science & Engineering Guru Nanak Dev University Regional Campus, Jalandhar, India Sheetal Kalra Department of Computer Science & Engineering Guru Nanak Dev University Regional Campus, Jalandhar, India Abstract:Big data is a collection of huge and complex datasets that includes the huge quantities of data, real time data, and social media analytics. Big data is a data analysis methodology enabled by recent advances in architecture and technologies. Due to such huge amount of data it is very difficult to perform effective analysis using existing traditional systems. Big data is characterized by the dimensions variety, volume, velocity, complexity, value. These papers introduce the evolution of big data and various issues and tools that are accepting and adapting by big data technology. Storage and data transport are technology issues, which seem to be solvable in the near-term. The paper concludes with the good big data practices to be followed. Keywords:Big Data,NoSQL,Mapreduce I. INTRODUCTION With the growth of services and technologies, the Internet represents a big space where great amounts of information are added every day. Big Data as the amount of data just beyond technology s capability to process store and manageefficiently [1]. The main difficulty in handling huge amount of data is because that volume is increasing rapidly. Big datasets are complex, diverse, large, distributed and longitudinal datasets generated from , video, click stream, sensors, and internet transactions digital sources available today and in the future. Today we are living in an information society and moving towards knowledge based society. Real time applications of big data in different industries like healthcare, network security, business, sports, education systems, market and gaming Industry, telecommunication [2].The main importance of big data consists to improve efficiency of different types and volume of data. Big data can be defined with the following characteristics: Data volume: the ever-growing information produced by enterprises in form of terabytes or Petabytes and supposed to increase zeta bytes in nearby future. As data volume increases the value of different records will decreases in proportion to quantity, richness, and type and age factors. The challenge is being able to locate, aggregate, analyze and identify datasets. Data velocity: velocity measures the speed of data aggregation, streaming and creation. Data volume will continue to grow, companies to store data of all sorts environment, medical and financial data and so on. Data variety: data is coming from different sources (both internal and external to an organisation) and includes structured, unstructured and semi structure data.[3] Data value: It measures the usefulness of data in making decisions. Value of data is one time use, it reused for future by combined with another set of data. Complexity: it quite an undertaking to cleanse, link, match and transforms data coming from different sources. It measures the degree of interconnectedness and interdependence in big data structures such that a small change (or combination of small change) in one or a few elements can yield very large changes or a small change that ripple across or cascade through the system and substantially affect its behaviour, or no change at all.[1] This paper is organised as follows. In section II big data evolution described along with recent information. In section III various big data issues has been discussed. In IV section tools and technologies of big data described in table. Future Scope has been discussed in section V for direction to emerging researchers and section VI gives the conclusion of the paper. II. Big data Evolution

2 In last twenty years, the data increasing day by day. The facts about data are500 million photos and 200 hours of new videos are uploaded to YouTube, 2 million searching queries on google every minute, 277,000 tweets everyminute [4].350 GB of data is processing on facebook, more than 100 million s are sent and more than 570 websites are created every minute. In 2013, estimates 4 Zettabyte of data generated worldwide [5]. In 2014, estimated worldwide data at a staggering 7ZB [6]. Today big data can be seen in the business, finance, banking, life sciences, astronomy, engineering andoceanography [7].According to recent research, in businesses average performance improvement of 26%. The IBM shows that 2.7 Zettabyteof data exists in the digital universe today [3]. The amount of data increase day-by-day that comes from various channels (table 1).[2] Table 1. Information explosion Content Type Quantity Comments Internet 20 Exabyte s (10 18 ) 1Exabyte=1,000,000 terabytes Blogs 70 Million 36,718 listed on (10 6 ) technocratic YouTube 375 Million As of December 2009 Visitors (10 6 ) Facebook Members 500 Million (10 6 ) 40% of online hours, top 10 properties Social 600 Million People (33% of content (10 6 ) internet users) Creators Social Members 2.1 Billion (10 9 ) Memberships-top 115 social sites Live Posts 2.1 Billion (10 9 ) Forums,discussion boards Tweets 20 Billion (10 9 ) 50 million user accounts Web Pages 1.5 Trillion Plus dark web Formal Periodicals 10s (10 12 ) Thousands (10 3 ) III. Issues of big data Newspapers, publications other There are fundamental issues areas in big data are: characteristic issues, processing issues, storage issues and management issues. Issues Issues related to characteristics Table 2. Issues Description Data Volume: The value of different data records will decrease in proportion to richness, type, age and quantity, as data volume increases. Every day social networking sites producing the data in terabytes, and this amount of data are difficult to handle by traditional systems. Data Velocity:Our Traditional systems are not capable to performing the analytics on the data which is constantly in motion. Bandwidth is the data velocity management issue. Data variety: All forms of data like row, structured, unstructured and semistructured data which is difficult to handle by the existing traditional systems. The large volume of data is a biggest obstacle from an analytic perspective. Data value: The different form of data that is stored by different organization is being used by data analytics. It will produce a gap in between the IT professionals and the business leaders. Data complexity: Current difficulty of big data is working with it using desktop visualization/statistics and relational database packages. Requiring massively parallel software running on thousands of servers. It is quit undertaking to link, cleanse, match, and transforms data coming from different sources. Storage and Transport The most recent, huge amount

3 Issues Data management Issues of data is produced by social media; there has been no new storage medium. Moreover, data is being produced by everyone and everything, by professionals (writers, journalists, scientists). [1]To transmit the data from storage or collection point to a processing point it would take longer time. To handle this issue data should be processed in place and transmit resulting information. Data management is difficult problem to address with big data. The sources of data are varied by: format, size and methods of collection, contributes digital data in form of drawing, documents, sounds, pictures and user interface design etc. Rigorous protocols are often followed in order to ensure validity and accuracy. New approaches to data validation and qualification are needed. Tosummarizethere is no perfect data management solution yet. Processing issues Effective processing of Exabyte will require new analytics algorithms and parallel processing in order to provide actionable and timely information.[8] IV. Tools and Technologies For the purpose of processing huge amount of data, the big data requires exceptional technologies. The various tools and technologies used to visualising, analyzing and manipulating the big data. Table 3. Tools and Technologies Technologies Tools NoSQL MapReduce Storage Servers Processing Databases Mango CouchDB Cassandra Redis Big Table Hbase Hyper table Voldemort Riak Hadoop Hive Pig Cascading Cascalog Mrjob Caffeine S4 MapR Oozie Greenplum Oozie S3 Hadoop Distributed File System EC2 Heroku Beanstalk Elastic Google App Engine R Tinker pop Big Sheets Elastric Search Datamur Mechanical Turk Yahoo! Pipes IV. FUTURE SCOPE The new applications are generating vast amount of data in structured and unstructured form. Big data is able to process and store that data and probably in more amountsin near future. Hopefully, Hadoop will get better. New technologies and tools that have ability to record, monitormeasure and combine all kinds of data around us, are going to be introduced soon. We will need new technologies and tools for anonymzing data, analysis, tracking and auditing information, sharing and managing, our own personal data in future. So many aspects of life health, education, telecommunication, marketing, sports and business etc that manages big data world need to be polished in future.

4 V. CONCLUSION We are in the development area of big data. To accept and adapt to this technology various issues regarding this described in paper. The use of big data will spread widely from the field of finance, multimedia, business and education. To resolve these issues all governments should develop strategic planning for big data, allow public use of big data improves productivity and establish laws or regulazation for better improvement in business. In future big data can have skills as well as technologies to work with. VI. REFRENCES [1] Frank Armour, J. Alberto Espinosa Big Data: Issues and Challenges Moving Forward 46 th Hawaii International Conference on system science [2] Sabia, Sheetal Kalra Applications of big Data: Current Status and Future Scope IJACTE volume- 3,Issue-5 ISSN: ,2014 [3] Elena Geanina ULARU, FlorinaCameliaPuican, Perspectives on Big Data and Big Data Analytics. Database Systems Journal vol. III, no. 4/2012 [4] Shilpa, ManjitKaur, BIG Data and Methodology- A review International Journal of Advanced Research in Computer Science and Software Engineering [5] Applying big-data technologies to network architecture E R I CSSONREVIEW 2012 BIG DATA: SEIZING OPPORTUNITIES, PRESERVING VALUES MAY 2014 Executive Office of the President [6] Big Data: What It Is and Why You Should Care IDC, 2011 [7] GaliHalevi, Dr.HenkMoed The Evolution Of Big Data as a Research and Scientific Topic Research Trends Issue 30 September 2012 [8] AvitaKatal, Mohammad Wazid Big Data: Issues, Challenges, Tools And Goog Practices pg: [9] Nirmala Singh, Sachchidanand Singh, Big Data Analytics, IEEE,(ICCICT), oct.2012,19-20 [10] Christian Szongott, Matthew Smith, Big Data Privary Issues in public Social Media, IEEE, 6 th DEST,june 2012, 18-20

5