Visualization with R: Extending the Impact of the Data Scientist. Kevin Purcell, Chief Data Scientist, WildFig Zach Bricker, Data Researcher, WildFig

Size: px
Start display at page:

Download "Visualization with R: Extending the Impact of the Data Scientist. Kevin Purcell, Chief Data Scientist, WildFig Zach Bricker, Data Researcher, WildFig"

Transcription

1 Visualization with R: Extending the Impact of the Data Scientist Kevin Purcell, Chief Data Scientist, WildFig Zach Bricker, Data Researcher, WildFig

2 WildFig Data science consultancy Partnership with HU Internship program with HU

3 Why Visualize?

4 Why Visualize?

5 Why Visualize? Friendly and Meyer Discrete Data Analysis with R

6 Why Visualize?

7 Why Visualize?

8 Why Visualize? R for Data Science (

9 Why programmatic? Reproducibility Provenance Automation Communication xkcd:

10 Why open-source?

11 Why open-source? To breakdown silos?

12 Why open-source? Or to connect silos?

13 Literate programming approach that integrates code and insights using a universal framework. Why?

14 knitr package

15 Delivering Real-Time Interactive Graphics to Businesses

16 The Objectives Evaluating marketing spend efficacy across multiple social media channels. How to connect marketing campaign calendars with social media outcomes? Delivering an interactive tools that allows CMO and marketing directors to appreciate analytical insights.

17 The Methods Built an architecture to connect internal and external data resources for the purposes of garnering insight into social media campaign efficacy. We leveraged the community developed and publicly available R packages (flexdashboard and shiny) to create an effective dashboarding tool.

18 The Demo

19 The Benefits This technology provides linked brushing capability with a low-barrier of expertise. This type of approach allows for a data scientist/ analyst to function independently from data collection to product delivery. Finally, the tools allows non-analysts to evaluate campaign results and adaptively alter marketing strategy with limited need for on-demand analysis.

20 Delivering multivariate forecasting for planning and logistics

21 The Objectives Develop an replacement for an established judgmental forecasting methodology with the flexibility to handle a > 30 product portfolio and ~85 geographies. Deliver an interactive interface to forecast prediction that would be accessible and informative to both marketing and operations. Forecast model results had to be able to integrate into existing workflows and communicate with proprietary supply chain management systems.

22 The Method We employed 1 st and 3 rd party data from the organization and data brokers to develop model variants. We utilized R and created a Shiny dashboard to deliver an attractive and interactive prediction model capable of being updated easily as additional data becomes available.

23 The Demo

24 The Benefits The type of approach allows a data science group to build a stem to stern predictive model and deliver it all from one code base with no proprietary software to hide mechanics Deploying, tuning, and updating is made easy by incorporating your own team or group of consultants The open source package shiny dashboard is easily extendable and has an out-of-the-box attractive UI requiring significantly less time to be spent on design

25 Visualizing Text for studying linguistic similarity

26 The Objectives How to quantify semantic similarity in linguistic corpora? How to visualize association between corpora to appreciate associations both between and among clusters? What visual idiom would be most effective at presenting such association data?

27 The Method We used cosine similarity, an established and analytically fast approach, to populate a similarity matrix Using a heat map we visualized similarity among corpora but could not appreciate hierarchical structures. We used a network based visualization approach to better demonstrate the similarity and layered structure of corpora similarity.

28 The Demo

29 The Benefits Visualizing the hierarchical structure of documents based on semantic similarity allows for a deeper understanding of messaging between entities This also lays the groundwork for finding similarities in documents and corporate messaging that is widely available Utilizing the R open source network tools and data gathering capabilities one data scientist is able to take an idea to production without having to consult outwardly for additional tools

30 Network Analysis: Demonstrating the Power of Progressive Analytics

31 The Objective The current methods most often used to visualize networks rely largely on overall weights of the nodes to be summary values This leaves a network that is somewhat informative but lacks the depth to be used in identifying key actors that may otherwise go unnoticed

32 The Method The primary challenge in gathering data was utilizing Twitter. To do this we created an architecture for constantly polling for users linked to a network while remaining in Twitter rate limits After gathering the data we created a set of statistical observations and weights taking into account additional node level statistics and relevance to the center After this we laid out all of the data and completed a graph network that can be parsed for information and highlight key nodes and edges

33 The Demo

34 The Benefits Taking a new approach to network analysis allows us to incorporate factors that weren t previously used due to the advent of social media Visualizing a network that has been properly weighted creates great insight to who are key figures in your social network, organization, or committee

35 In Conclusion The true power of R and many open-source toolsets is to connect and integrate not to fragment the data science world. These tools provide a superior level of visualization with direct integration to the analytical environment. Employing tools like R and Rmarkdown allow for analysts or data scientists to have farther reaching impacts, with higher productivity within a leaner more efficient operational team.

36 @zach_wildfig