Business Intelligence. By Prof.Sushila Aghav-Palwe

Size: px
Start display at page:

Download "Business Intelligence. By Prof.Sushila Aghav-Palwe"

Transcription

1 Business Intelligence By Prof.Sushila Aghav-Palwe

2 Introduction Business intelligence may be defined as a set of mathematical models and analysis methodologies that exploit the available data to generate information and knowledge useful for complex decision-making processes

3 Effective and timely decisions The ability of the knowledge workers to make decisions, both as individuals and as a community, is one of the primary factors that influence the performance and competitive strength of a given organization stagnant decision-making :Most knowledge workers reach their decisions primarily using their experience, knowledge of the application domain and the available information, but is inappropriate for frequent and rapid changes in the economic environment

4 Cont. decision-making processes within today s organizations are often too complex and dynamic Need more rigorous attitude based on analytical methodologies and mathematical models The main purpose of business intelligence systems is to provide knowledge workers with tools and methodologies that allow them to make effective and timely decisions.

5 Effective decisions: Understand, Evaluate The application of rigorous analytical methods allows decision makers to rely on information and knowledge which are more dependable. As a result, they are able to make better decisions and devise action plans that allow their objectives to be reached in a more effective way. Indeed, turning to formal analytical methods forces decision makers to explicitly describe both the criteria for evaluating alternative choices and the mechanisms regulating the problem under investigation. Furthermore, the ensuing in-depth examination and thought lead to a deeper awareness and comprehension of the underlying logic of the decision-making process.

6 Timely decisions Enterprises operate in economic environments characterized by growing levels of competition and high dynamism. As a consequence, the ability to rapidly react to the actions of competitors and to new market conditions is a critical factor in the success or even the survival of a company.

7 Operational Data Vs Informational Data Operational Data : OLTP(Current, Detailed Data) Information: OLAP(DW): Processed Data, Overall View of Data

8 Business Intelligence Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making. Systems that provide directed background data and reporting tools to support and improve the decision-making process. A popularized, umbrella term used to describe a set of concepts and methods to improve business decision making by using fact-based support systems. The term is sometimes used interchangeably with briefing books and executive information systems. Business Intelligence is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help clients make better business decisions. A system that collects, integrates, analyses and presents business information to support better business decision making. Business Intelligence is an environment in which business users receive information that is reliable, secure, consistent, understandable, easily manipulated and timely...facilitating more informed decision making

9 Importance of BI Improve Management Processes planning, controlling, measuring and/or changing resulting in increased revenues and reduced costs Improve Operational Processes fraud detection, order processing, purchasing.. resulting in increased revenues and reduced costs Predict the Future

10 BI Objectives Enable interactive access to data (sometimes in real time) Enable manipulation of data to allow appropriate analysis by managers Provide valuable insights to produce informed and better decisions The process of BI is based on transformation of data to information, then to decisions and finally to actions Facilitate closing the strategy gap of an organization

11 Benefits of a business intelligence system decision makers ask themselves a series of questions and develop the corresponding analysis. Hence, they examine and compare several options, selecting among them the best decision, given the conditions at hand. If decision makers can rely on a business intelligence system facilitating their activity, we can expect that the overall quality of the decision-making process will be greatly improved. With the help of mathematical models and algorithms, it is actually possible to analyze a larger number of alternative actions, achieve more accurate conclusions and reach effective and timely decisions.

12 Data, information and knowledge Generally, data represent a structured codification of single primary entities, as well as of transactions involving two or more primary entities. For example, for a retailer data refer to primary entities such as customers, points of sale and items, while sales receipts represent the commercial transactions. Information is the outcome of extraction and processing activities carried out on data, and it appears meaningful for those who receive it in a specific domain. Knowledge. Information is transformed into knowledge when it is used to make decisions and develop the corresponding actions. Therefore, we can think of knowledge as consisting of information put to work into a specific domain, enhanced by the experience and competence of decision makers in tackling and solving complex problems.

13 Business intelligence analysis First, the objectives of the analysis are identified and the performance indicators that will be used to evaluate alternative options are defined. Mathematical models are then developed by exploiting the relationships among system control variables, parameters and evaluation metrics. Finally, what-if analyses are carried out to evaluate the effects on the performance determined by variations in the control variables and changes in the parameters.

14 BI Architecture and Components of BI

15 BI Cycle Analysis. During the analysis phase, it is necessary to recognize and accurately spell out the problem at hand. Decision makers must then create a mental representation of the phenomenon being analyzed, by identifying the critical factors that are perceived as the most relevant. Insight. The second phase allows decision makers to better and more deeply understand the problem at hand, often at a causal level. The information obtained through the analysis phase is then transformed into knowledge during the insight phase Decision. During the third phase, knowledge obtained as a result of the insight phase is converted into decisions and subsequently into actions Evaluation. Finally, the fourth phase of the business intelligence cycle involves performance measurement and evaluation. Extensive metrics should then be devised that are not exclusively limited to the financial aspects but also take into account the major performance indicators defined for the different company departments.

16 Enabling factors in business intelligence projects : technologies, analytics and human resources Technologies. Hardware and software technologies are significant enabling factors that have facilitated the development of business intelligence systems within enterprises and complex organizations Analytics. Mathematical models and analytical methodologies play a key role in information enhancement and knowledge extraction from the data available inside most organizations. The mere visualization of the data according to timely and flexible logical views, Human resources. The human assets of an organization are built up by the competencies of those who operate within its boundaries, whether as individuals or collectively. The overall knowledge possessed and shared by these individuals constitutes the organizational culture

17 Development of a business intelligence system Analysis. During the first phase, the needs of the organization relative to the development of a business intelligence system should be carefully identified. Design phase includes two sub-phases and is aimed at deriving a provisional plan of the overall architecture First, it is necessary to make an assessment of the existing information infrastructures and the main decision-making processes that are to be supported by the business intelligence system should be examined. Second. using classical project management methodologies, the project plan will be laid down Planning. The planning stage includes a sub-phase where the functions of the business intelligence system are defined and described in greater detail. Implementation and control. The last phase consists of five main sub-phases : DW and Dmarts, ETL,Metadata Archives, Testing

18 Problem Solving Process The alternatives represent the possible actions aimed at solving the given problem and helping to achieve the planned objective. Criteria are the measurements of effectiveness of the various alternatives and correspond to the different kinds of system performance : System performances like profitability dependability overall cost risk productivity service quality flexibility

19 The decision-making process Structure of Decision Making Process Factors influencing a rational choice : Economic : minimization of costs or the maximization of Profits? Technical : technically feasible? Legal : compatible with the legislation? Ethical : abide by the ethical principles and social rules of the community? Procedural : Follows Organization cultural limitations? Political : Any political consequences? Phases of Decision Making Process

20 Types of decisions As per the nature, Decisions can be classified as structured, unstructured or semi-structured Structured decisions. A decision is structured if it is based on a well-defined and recurring decision-making procedure Unstructured decisions. A decision is said to be unstructured if the three phases of intelligence, design and choice are also unstructured Semi-structured decisions. A decision is semi-structured when some phases are structured and others are not

21 Types of Decision Depending on their scope, decisions can be classified as strategic, tactical and operational. Strategic decisions. Decisions are strategic when they affect the entire organization or at least a substantial part of it for a long period of time Tactical decisions. Tactical decisions affect only parts of an enterprise and are usually restricted to a single department. The time span is limited to a medium-term horizon, typically up to a year. Operational decisions. Operational decisions refer to specific activities carried out within an organization and have a modest impact on the future. Operational decisions are framed within the elements and conditions determined by strategic and tactical decisions.

22 Characteristics of the information in terms of the scope of decisions

23 Extended structure of a decision support system

24 Mathematical Model Mathematical model(a symbolic model) is an abstract representation of a real system. It is intended to describe the behavior of the system through a series of symbolic variables, numerical parameters and mathematical relationships.

25 BI With Mathematical Treatment Business intelligence may be defined as a set of mathematical models and analysis methodologies that exploit the available data to generate information and knowledge useful for complex decision-making processes. The main purpose of business intelligence systems is to provide knowledge workers with tools and methodologies that allow them to make effective and timely decisions.

26 The role of mathematical models in BI A business intelligence system provides decision makers with information and knowledge extracted from data, through the application of mathematical models and algorithms. Mathematical Model is abstract representation of real systems

27 Phases in Development of Mathematical Model Problem at hand must be correctly identified. The observed critical symptoms must be analyzed and interpreted in order to formulate hypotheses defining an appropriate mathematical model to represent the system. Factors included in model forming : time horizon, the decision variables, the evaluation criteria, the numerical parameters and the mathematical relationships Finding Solution Algorithm to Assess the Model and Implement software tool incorporating that algorithm finally implemented, tested and utilized in the application domain.

28 Model Formation A number of factors affect and influence the choice of model, such as the time horizon, the decision variables, the evaluation criteria, the numerical parameters and the mathematical relationships. Time Horizon: The time span considered in a model. may vary depending on the specific problem considered. Evaluation criteria: measurable performance indicators (Accuracy, Quality, Reliability, Flexibility, Effectiveness etc). Decision variables: Symbolic variables representing alternative decisions. Numerical parameters: identify and estimate all numerical parameters required by the model. Mathematical relationships: mathematical relationships among the decision variables, the numerical parameters and the performance indicators.

29 Case Study MBI corporation, which manufacturers special purpose computers needs to make a decision : How many computer should it produce next month at Boston Plant? MBI is considering two types of computer : CC7, which requires 300 days of labor and $10000 in material. CC8, which requires 500 days of labor and $15000in material. The profit contribution of each CC7 is $8000, CC8 is $ The plant has a capacity of working days per month. And the material budget is $8 million per month. Marketing requires that at least 100 units of CC7 and 200 Units of CC8 be produced each month. The problem is to maximize the company s profit by determining how many units of CC7 and CC8 should be produced each month.

30 Mathematical Model What are Decision Variables? What are result Variables? What is Objective? Which are uncontrollable Variable [Constraints]

31 Case Study MBI corporation, which manufacturers special purpose computers needs to make a decision : How many computer should it produce next month at Boston Plant? MBI is considering two types of computer : CC7, which requires 300 days of labor and $10000 in material. CC8, which requires 500 days of labor and $15000in material. The profit contribution of each CC7 is $8000, CC8 is $ The plant has a capacity of working days per month. And the material budget is $8 million per month. Marketing requires that at least 100 units of CC7 and 200 Units of CC8 be produced each month. The problem is to maximize the company s profit by determining how many units of CC7 and CC8 should be produced each month.

32 Mathematical Model What are Decision Variables? : X1: Units of CC7, X2: Units of CC8 What are result Variables? : Total Profit = Z What is Objective? How to Maximize Total Profit such that { Relationship between Decision Variable which is best case } Which are uncontrollable Variable [Constraints] Labor Constraint Budget Constraint Marketing Requirement

33 Mathematical Model of Product-Mix Example Decision Variable X1 =Units od CC7 X2=Units of CC8 Mathematical Relationship Maximize Z [profit] Subject to Constraint Result Variable Total Profit = Z Z= 8000 X X2 Constraints (uncontrollable) 300 X X2 <= 200, X X2 <= X1 >= 100 X2 >= 200 Best Case X1= X2=200 Profit : $5,066,667

34 Mathematical Modelling Process in BI First, the objectives of the analysis are identified and the performance indicators that will be used to evaluate alternative options are defined. Mathematical models are then developed by exploiting the relationships among system control variables, parameters and evaluation metrics. Finally, what-if analyses are carried out to evaluate the effects on the performance determined by variations in the control variables and changes in the parameters.

35 Advantages of Using Mathematical Modelling in BI The development of an abstract model forces decision makers to focus on the main features of the analyzed domain (inducing a deeper understanding of the phenomenon under investigation). the knowledge about the domain acquired when building a mathematical model can be more easily transferred in the long run to other individuals within the same organization, thus allowing a sharper preservation of knowledge in comparison to empirical decisionmaking processes. a mathematical model developed for a specific decision-making task is so general and flexible that in most cases it can be applied to other ensuring situations to solve problems of similar type.

36 Classes of Mathematical Model For Decision Making predictive models; pattern recognition and learning models; optimization models; project management models; risk analysis models;

37 BI Challenges Poor data quality :Bad quality of information Dirty Data,Computer systems impede rather than enhance BI efforts Context Lack of standardize procedures or processes User resistance No Top Down support

38 Types of Data Analytics Analytics is about discovering the useful patterns from data Analytics play vital role in Business Intelligence

39

40 BIG DATA How much data is generated through social media tools? People send more than billion messages sent a day. People and brands on Twitter send more than 340 million tweets a day. People on Facebook share more than 684,000 bits of content a day. People upload 72 hours (259,200 seconds) of new video to YouTube a minute. Consumers spend $272,000 on Web shopping a day. Google receives over 2 million search queries a minute. Apple receives around 47,000 app downloads a minute. Brands receive more than 34,000 Facebook likes a minute. Tumblr blog owners publish 27,000 new posts a minute. Instagram photographers share 3,600 new photos a minute. Flickr photographers upload 3,125 new photos a minute. People perform over 2,000 Foursquare check-ins a minute. Individuals and organizations launch 571 new websites a minute. WordPress bloggers publish close to 350 new blog posts a minute. The Mobile Web receives 217 new participants a minute.

41 Properties of BIG DATA: Volume: Amount of data. Velocity: Speed of data in and out. Variety: Range of data types and sources.

42

43 Example of BIG DATA

44 BIG DATA Analytics Big data analytics is the process of examining large and varied data sets - - i.e., big data -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions.

45 Hadoop Apache Hadoop is an open source software framework, that supports data intensive, distributed,parallel applications. It supports the running of applications on large clusters of commodity hardware. Hadoop implements a computational paradigm named MapReduce where application divided into many small fragments of work. Each Fragment of work may be executed on any node in cluster. In addition Hadoop provides a distributed filesystem(hdfs) that stores data on the compute nodes. Both MapReduce and HDFS are designed so that node failures are automatically handled by the framework.

46 HADOOP The Apache Hadoop framework is composed of the following modules: Hadoop Common contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS) a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN(Yet Another Resource Negotiator) a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce a programming model for large scale data processing.

47

48 Map Reduce Map Reduce is a programming model for parallel data processing. MapReduce works by breaking the processing into two phases: the map phase reduce phase. Each phase has key-value pairs as input and output, the types of which

49

50

51

52

53 Apache Hive Apache Hive is Data Warehouse infrastructure built on top of Hadoop that provides data summarization, query and analysis. Apache Hive supports analysis of large data sets stored in Hadoop compatible file systems. It provides a SQL-Like language : HiveSQL

54 Apache PIG Pig's infrastructure layer consists of a compiler that produces sequences of Map- Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. PIG is high level natively parallel data flow language and execution framework for creating MapReduce programs.

55 Apache HBase Is open source,non relational,distributed database model written in Java It Runs on the top of HDFS Hbase provides fault tolerant way of storing large quantities of spanse data Hbase features compression,in Memory operations on per colum basis. Tables in Hbase can serve as input and output for MapReduce Jobs in Hadoop.

56 BI in Banking System Services : Loan, Credit Cards, Accounts Objective for Betterment : Quality of Loans, Reducing Bad debt., Retain good customer Possible Solutions : Automate Loan application Process [ Using Decision models for Loan Approval ] Detect Fraudulent Transactions [Find patterns of Fraudulent Transactions ] Maximize Customer value [Cross Selling, Up-selling ] [Find out customer with with good credit, offer them services, find appropriate services to offer ] Optimize cash reserves with forecasting [Maintain liquidity to meet need of depositors. Identify pattern of withdraw, trend to find ot how much to keep, and then invest the rest money ]

57 BI in Retail Industry Services: Providing product purchase service to customer Objective for Betterment : Meet customer need /expectation Possible Solution : Understanding customer shopping patterns Optimize inventory level Improve store layout and sales promotions Optimize logistics for seasonal effects Minimize losses due to limited product life

58 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne Gretzky,... Outliers are different from the noise data Noise is random error or variance in a measured variable Noise should be removed before outlier detection Outliers are interesting: It violates the mechanism that generates the normal data Outlier detection vs. novelty detection: early stage, outlier; but later merged into the model Applications: Credit card fraud detection Telecom fraud detection Customer segmentation Medical analysis 58

59 Types of Outliers (I) Three kinds: global, contextual and collective outliers Global Outlier Global outlier (or point anomaly) Object is O g if it significantly deviates from the rest of the data set Ex. Intrusion detection in computer networks Contextual outlier (or conditional outlier) Object is O c if it deviates significantly based on a selected context Ex. 80 o F in Urbana: outlier? (depending on summer or winter?) Attributes of data objects should be divided into two groups Contextual attributes: defines the context, e.g., time & location Behavioral attributes: characteristics of the object, used in outlier evaluation, e.g., temperature Issue: How to define or formulate meaningful context? 59

60 Types of Outliers (II) Collective Outliers A subset of data objects collectively deviate significantly from the whole data set, even if the individual data objects may not be outliers Applications: E.g., intrusion detection: When a number of computers keep sending denial-ofservice packages to each other Collective Outlier Detection of collective outliers Consider not only behavior of individual objects, but also that of groups of objects Need to have the background knowledge on the relationship among data objects, such as a distance or similarity measure on objects. A data set may have multiple types of outlier One object may belong to more than one type of outlier 60

61 Challenges of Outlier Detection Modeling normal objects and outliers properly Hard to enumerate all possible normal behaviors in an application The border between normal and outlier objects is often a gray area Application-specific outlier detection Choice of distance measure among objects and the model of relationship among objects are often application-dependent E.g., clinic data: a small deviation could be an outlier; while in marketing analysis, larger fluctuations Handling noise in outlier detection Noise may distort the normal objects and blur the distinction between normal objects and outliers. It may help hide outliers and reduce the effectiveness of outlier detection Understandability Understand why these are outliers: Justification of the detection Specify the degree of an outlier: the unlikelihood of the object being generated by a normal mechanism 61