Power BI for Data Science Integration and exploration capabilities

Size: px
Start display at page:

Download "Power BI for Data Science Integration and exploration capabilities"

Transcription

1 Power BI for Data Science Integration and exploration capabilities J AV I E R G U I L L E N C H A R LOT T E B I G R O U P C H A R LOT T E, N C

2 Power BI for Data Science exploration Different mindset from traditional BI going beyond slicing and dicing data In traditional BI, Power BI is a front end charting tool for pre-computed data In Data Science, Power BI can provide new insights on existing data producing new knowledge that becomes a source for reporting. To accomplish this, we use Big Data repositories and predictive model integrations

3 Agenda Provide overview of six key scenarios for Power BI integration with data science tools Session is not a deep dive on math or coding concepts Will focus mostly on data exploration efforts (in contrast to operationalization)

4 About me Director, Data Syntelli Solutions Adjunct Faculty City University of New York, Data & Analytics program SME Board Advisor Central Piedmont Community College Co-founder Charlotte BI Group (Official Power BI Group) Co-organizer SQL Saturday

5 Out of the box Data Science Capabilities in Power BI Q&A and Q&A Explorer Explain Increase / Decrease Explore when distribution is different New DAX functions: NORM.DIST, NORM.INV, NORM.S.DIST, NORM.S.INV, T.DIST, T.DIST.2T, T.DIST.RT, T.INV, T.INV.2T, Correlation Coefficient (Quick measure) Custom Visuals (Advanced Analytics) Forecasting (Line Chart)

6 Scenario 1 : R Based data explorations What is R? R is a language and environment for statistical computing and graphics. It is one of the most popular languages for wrangling data and developing predictive models. How can I use it in Power BI? R can be used in 3 ways in Power BI when loading data, to implement data transformations, and to visualize. Does it work with the Power BI Service? Yes, the Power BI service also comes with a wide variety of pre-installed R packages.

7 Loading data from R

8 To work with local R packages you can use your favorite R IDE : To use a library in Power BI simply call the library(<package>) function For packages installed in the cloud service see:

9 Data wrangling with R R Scripts can be used in the Query Editor: library(dplyr) iris_mean <- summarize(group_by(iris, Species), slength = mean(sepal.length), swidth = mean(sepal.width), plength = mean(petal.length), pwidth = mean(petal.width))

10 Visualizing in R Make sure to add fields into the R visual, as an automated data frame is created with them

11 K means clustering, for example, can be easily integrated into your interactive analysis Original Species Definition Predicted Species

12 R visuals - limitations Data used by R visual is limited to 150,000 rows Execution time cannot be more than 5 minutes (it will time out) Cannot be source of cross-filtering interaction See updates here:

13 Scenario 2: Using Hadoop for Clickstream Analysis What is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. What is Hive? Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. It provides a framework for creating schemas on existing data. What are Zeppelin notebooks? Apache Zeppelin is a multi-purposed web-based authoring tool which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark How can we use Hadoop/Hive in Power BI for exploratory reporting? We have use cache-optimized LLAP for interactive reporting on data with minimal processing

14 Sankey visual for tracing web behavior in Power BI over a Hive table

15 Scenario 3: Add interactivity to experiments on Apache Spark

16 What is Apache Spark? Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. How can Power BI benefit Spark based analysis? Power BI can connect to Spark clusters and enhance experiments by providing drilling and interactive analysis to dimensional data that has been augmented with predictive output

17 On a Jupyter notebook we can, at any point, invoke the write method of a dataframe to expose it to Power BI:

18 You can also select only a few columns of interest for exploration: predictionsdf[['name','prediction']].write.saveastable( predictions )

19 Scenario 4: Interactive machine learning-based reporting What is Azure Machine Learning? Microsoft Azure Machine Learning Studio is a collaborative, drag-and-drop tool you can use to build,test, and deploy predictive analytics solutions on your data. How can we use them in Power BI? Given Azure Machine Learning models can be exposed as web services, we can use R or Python (combined with reporting slicers in Power BI) to call predictive models interactively.

20 For example, when exploring price elasticity we can use Power BI with AML to create a what-if tool and interactively display predicted revenue at specified price points:

21 Scenario 5: Power BI in IoT Scenarios What is Internet of Things (IoT)? The Internet of Things (IoT) is the network of physical devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, actuators, and connectivity which enables these things to connect and exchange data, creating opportunities for analytics. For example: predictive maintenance. How do we use Power BI for IoT? Power BI can implement streaming scenarios by either consuming real-time datasets, or by leveraging streaming technologies like Azure Stream Analytics.

22 Three types of real-time datasets we can enable in Power BI directly:

23 Power BI can be an endpoint for Stream Analyticsbased Architecture:

24 Scenario 6: Leveraging Cognitive Automation What is cognitive automation? Cognitive automation refers to AI techniques applied to using robotic approaches for emulating humans over specific business processes. What can we do with it in Power BI? Azure provides pre-trained models able to handle a variety of AI tasks. For, example speech analysis and imagine recognition can be leveraged by Power BI for on-the-fly analysis of call center or factory data.

25 In a call center environment, Voice of Customer (VOC) projects can deliver automated speech to text and key phrase analytics. This can provide near real-time identification of issues that can inform operational and marketing efforts.

26 A final note - Power BI Templates End to end solutions that utilize various Azure engineering and data science components to create plug & play experiences.

27 Thank you! Javier Guillen