Systematic Actionable Mining of Software Repositories (Lectures 3 & 4)

Similar documents
Of Changes and their History: Ideas for New IDEs

Application Lifecycle Management for Subversion

PLM APPLICATION TESTING

System integration and software process

Mining Software Repositories

Tools and technology usage in PFMS application lifecycle management process

Test Workflow. Michael Fourman Cs2 Software Engineering

Lecture 22 System Development

Demand & Requirements Management Software Development QA & Test Management IT Operations & DevOps Change Management Agile, SAFe, Waterfall Support

Mango Solution Easy Affordable Open Source. Modern Building Automation Data Acquisition SCADA System IIoT

BIG DATA AND HADOOP DEVELOPER

MINING SOFTWARE REPOSITORY FOR IMPROVEMENT OF IT PROJECT MANAGEMENT PROCESS

Agile Engineering. for Managers. Introducing agile engineering principles for non-coders

Analyzing the Evolution of Software by Change Analysis

Configuration Management Report MOMO SOFTWARE

Performance-Oriented Software Architecture Engineering: an Experience Report

Risk Reporter users are allocated to groups, and members of different groups have access to different sets of reports and model runs.

POINTS OF DEFECT CREATION

Automated Testing with CA Plex, CA 2E and Worksoft Certify DevOps for CA Plex

DevOps E m p o w e r Q u a l i t y A s s u r a n c e b e n e f i t s f o r y o u r p r o j e c t s

DevOps E m p o w e r Q u a l i t y A s s u r a n c e b e n e f i t s f o r y o u r p r o j e c t s

<Insert Picture Here> Oracle Business Process Analysis Suite: Overview & Product Strategy

This document describes the overall software development process of microcontroller software during all phases of the Company Name product life cycle.

A TEAM-BASED PROJECT QUALITY MANAGEMENT SYSTEM

1 P a g e P r i s m e t r i c T e c h n o l o g i e s

Sharing and Deploying MATLAB Programs

The Product and the Process The Product The Evolving Role of Software Software Software: A Crisis on the Horizon Software Myths Summary References

DATA ACQUISITION PROCESSING AND VISUALIZATION ALL-IN-ONE END-TO-END SOLUTION EASY AFFORDABLE OPEN SOURCE

What's New With Rational Team Concert (TM)

CORE APPLICATIONS ANALYSIS OF BUSINESS-CRITICAL ADABAS & NATURAL

Replication of Defect Prediction Studies

Analyze, Design, and Develop Applications

Life Cycle Plan (LCP)

The Development Productivity Platform

PV213 Enterprise Information Systems in Practice 09 Security, Configuration management

Collaborative ALM Interoperability

Visit California Digital Solutions, Drupal Development Website Experience. ITRS Case Study.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Realize Positive ROI on Your SOA Investments with Vitria M 3. O Suite

FREQUENTLY ASKED QUESTIONS

COPYRIGHTED MATERIAL WHAT S IN THIS CHAPTER?

Announcements. Lecture 22 System Development. Announcements. Announcements. CSE 331 Software Design and Implementation. Leah Perlmutter / Summer 2018

Agenda Discover the benefits of an integrated financial planning

Software Life Cycles and Configuration Management

Title page - CLEO Baltimore, May 9, 2007 Software Engineering Processes Used to Develop the NIF Integrated Computer Control System* A.

Evaluating Software Development Environments

Agenda. ClearQuest 8.0 What s New. Positioning Integrations Collaboration Administration New Features Deprecations Q&A

CSE 331 Software Design & Implementation. Hal Perkins Winter 2018 System Integration and Software Process

Web 2.0 / UI Engineer and Consultant

SAMPLE REQUEST FOR PROPOSAL

Software AG Product Roadmap & Vision

The Information Integration Platform

Master Data Governance & SAP Information Steward Integration. Jens Sauer, SAP Switzerland September 25 th, 2013

Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project

How to Successfully Collect, Analyze and Implement User Requirements Gerry Clancy Glenn Berger

Test Management Test Planning - Test Plan is a document that is the point of reference based on which testing is carried out within the QA team.

Living Architectures - from eclipse to jazz

Windchill ProjectLink Curriculum Guide

Integrating MATLAB Analytics into Enterprise Applications The MathWorks, Inc. 1

Course Overview. SAP AG 2006, / Ubicomp Heuser, Nochta / 11. CEC Darmstadt. SAP Research. Outline SAP

Getting ready for ALM Octane

A 7-STEP FRAMEWORK TO IMPLEMENT CICD IN ETL TESTING

How to Design a Successful Data Lake

MATLAB for Data Analytics The MathWorks, Inc. 1

Data Warehousing provides easy access

What s Hot with Web Services?

Software Engineering & Architecture

Social Analytics. More than Listening Social Media Strategy. Creating relationship. Build advocacy. Improve loyalty

BCS THE CHARTERED INSTITUTE FOR IT. BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 6 Professional Graduate Diploma in IT SOFTWARE ENGINEERING 2

SOCCI - Towards a Common Software Engineering Environment for Science Operations

Testing Masters Technologies

A Project Monitoring Cockpit Based On Integrating Data Sources in Open Source Software Development

Micro process analysis of maintenance effort: an open source software case study using metrics based on program slicing

"Web Age Speaks!" Webinar Series. Introduction to DevOps

DATASHEET COLLABNET TEAMFORGE

PeopleSoft Test Framework

The Benefits of Running JD Edwards EnterpriseOne on the Oracle Technology Stack. A.J. Schifano Principal Product Manager Oracle

WORKSOFT AUTOMATED BUSINESS PROCESS DISCOVERY & VALIDATION

The Business Process Environment

Brochure. Application Lifecycle Management. Accelerate Your Business. Micro Focus Application Lifecycle Management Software

SE420 Software Quality Assurance

Software Quality Assurance Using Reusable Components

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Services Description. Transformation and Plan Services. Business Transformation and Plan Services

SIMPLIFYING BUSINESS ANALYTICS FOR COMPLEX DATA. Davidi Boyarski, Channel Manager

Chapter 6: Software Evolution and Reengineering

Optimizing Service Assurance with Vitria Operational Intelligence

Petri Juhani Lehtonen ( ) EU-Citizen, Finland

We see most production services running either in the conventional way or in between the conventional and the new way.

Architecting JIRA for the Enterprise. JIRA is a powerful product, both flexible and highly configurable. Miles Faulkner

Course Introduction and Overview

esign Build and Innovate your business with

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

A Business Plans Training Tool based on the Semantic Web Principles

Interface Management in a Large Enterprise

DevOps KANOKWATT SHIANGJEN COMPUTER SCIENCE SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY UNIVERSITY OF PHAYAO

Stat Production Services for PeopleSoft (Onsite and Remote)

IBM Rational Software

Transcription:

software evolution & architecture lab Systematic Actionable Mining of Software Repositories (Lectures 3 & 4) Harald Gall University of Zurich, Switzerland http://seal.ifi.uzh.ch @ LASER summer school 2014

Roadmap for the Seminar Teaser I. Mining Studies III. Mining and Quality Analysis III. Replication and Benchmarking IV. Tooling Type to enter text

Software Mining Studies: Where are we now? software evolution & architecture lab

Nature of Studies Replication of studies Less than 20% can be replicated, from all the empirical studies published in MSRconf 2004-2009 [G. Robles: Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings. MSR 2010] wrt public availability of (1) the data used as case study, ii) the processed dataset used by researchers and iii) the tools and scripts. Data availability Raw data for OSS is easily available straight from publicly available sources, e.g. PROMISE Complete (pre/post) data is not yet available Data processing and tools Type to enter text

Top 20 projects analyzed [G. Robles: Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings. MSR 2010]

Performance & time variance J. Ekanayake, J. Tappolet, H. Gall, A. Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012

Prediction Quality vs. Time Eclipse heat-map Prediction quality on same target using different training periods with the point of highest AUC highlighted Type to enter text

Software Mining: Where to go from here? software evolution & architecture lab

The Screening Plant of a SW Miner 9

What is missing? Replication Large-scale comparative studies Preprocessing and Learners Calibration Benchmarking Lining up of essential questions Adopting technologies from other fields Type to enter text

Replicability Evaluation Mining Studies of MSR 2004-2011 84 (49%) experimental/empirical studies 88 (51%) non-experimental studies (new methods, tools, case studies, visualizations, etc.) Studies classified into 6 categories and manually checked if they can be replicated with SOFAS: Version History Mining, History Mining, Change Analysis, Social Networks and People, Defect Analysis, Bug Prediction Type to enter text

MSR Replication with SOFAS Research category Number of Fully replicable Partially replicable Non replicable papers papers papers papers Version History 8 (9%) 4 0 4 Mining History Mining 17 (20%) 0 8 9 Change Analysis 13 (15%) 5 6 2 Social Networks 19 (22%) 6 5 8 and People Defect Analysis 19 (22%) 8 6 5 Bug Prediction 8 (9%) 2 2 4 88 (100%) 25 (30%) 27 (32%) 32 (38%) 84 Table 1 The results of the replicability evaluation.

Replicability Full replication: 30% of the studies can be fully replicated out of the box Partial replication: 32% of the studies can be partially replicated As evaluation, we fully replicated Do time of day and developer experience affect commit bugginess? by J. Eyolfson, L. Tan and P. Lam, MSR 2011 Type to enter text

Replication of a study We replicate the study to verify the 4 main findings We extend the study by testing the findings for additional OSS projects: Apache HTTP, Subversion, and VLC We analyze the results Do we achieve the same results? Can the original conclusions also be drawn for the additionally investigated projects? Type to enter text

Analysis Workflow

Replication results /1 Percentage of buggy commits We confirmed the results of the original study with slight differences (different heuristic and date of analysis) The additional projects exhibit similar values (22-28%) Original Study Extended Study # commits # bug-introducing commits # bug-fixing commits Linux 268820 68010 (25%) 68450 PostgreSQL 38978 9354 (24%) 8410 Apache Http Server 30701 8596 (28%) 7802 Subversion 47724 12408 (26%) 10605 VLC 47355 10418 (22%) 10608 TABLE II Type to enter text

Replication results /2 Influence of time of the day We confirmed the results of the original study The amount of buggy commits are particularly high between midnight and 4 AM and tends to then drop below average (morning and/or early afternoon)! Windows of low bugginess greatly vary between projects Commit bugginess follows very different patterns Type to enter text

Replication results /3 Influence of developer We confirmed the results of the original study A drop in commit bugginess is evident with the increasing amount of time a developer has spent on a project Influence of day of the week We confirmed the results of the original study Different weekly patterns in the additional projects Commits on different weekdays do not have the same level of bugginess Type to enter text

Interpretation of results Feasibility We can replicate 30% of the analyzed studies and compute the ground data needed for another 32% The studies we can replicate all use historical data extracted from different repositories Scalability The approach can scale up to very many projects Once the analysis workflow is defined, it can be automatically run with different project repositories Still, limitation is total execution time (Apache HTTP ~8 hrs) Type to enter text

Interpretation of results Extensibility We only focused on the replication of existing studies The results and ground data produced by SOFAS analyses can be fed to other services, used by thirdparty analyses and tools or combined with data from other sources. Do time of day, developer experience and file ownership affect commit bugginess? e.g. taking into account code ownership measured using the Gini coefficient [Giger, 2011] Type to enter text

To get to the next level... Support for replicability & systematic analysis workflows Calibration of data preprocessing Performance measures & performance criteria for studies Conclusion stability of studies Type to enter text

Software Mining Studies with SOFAS software evolution & architecture lab

SOFtware Analysis Services SOFAS = RESTful service platform using software evolution ontologies enabling the composition of analysis workflows tiny.cc/miningstudies Type to enter text

Approach General Concepts Domain Specific Concepts Bugs Code History System Specific Concepts Bugzilla Trac Java C C++ CVS SVN GIT Type to enter text

SOFAS scenario SVN History Service Famic Model Service OO Metrics Service Ok, so I would need to: 1. extract the project history 2. reconstruct How did a detailed OO source code metrics model evolved for Ok, each now in release let s query this this project 3. extract, for each release, data! history? the metrics I want

Semantic links Bug History extractor http://myproject.org/bugs/nr124" Bug-Revision linker Version Control history extractor http://ifi.uzh.ch/ svnimporter myproject/foo.java23 http://myproject.org/bugs/nr124" http://sofas.org/bugontology/affects" http://ifi.uzh.ch/svnimporter/" myproject/foo.java23 software analysis service

SEON Pyramid(s) www.se-on.org General Concepts Domain Specific Concepts Bugs Code History System Specific Concepts Issue Tracking Version Control Source Code Bugzilla Trac Bugzilla Change Coupling Trac CVS SVN GIT Java C# Change Types CVS SVN GIT Java C# Software Design Metrics

! General Concepts Domain Specific Concepts Properties: Properties: hasmethod, hasname, hasmodreport, hasdescription, Classes: implementsinterface,... belongstoclass,... Issue, Priority, Severity,... Properties:! Classes: Version, ModificationReport, Class, Method, Release,... Interface,... haspriority, hasdescription, hasseverity,...! Classes: Bugs Code History System Specific Concepts Bugzilla Trac Java C# C++ CVS SVN GIT

Current SOFAS services Data Gatherers Version history extractor for CVS, SVN, GIT, and Mercurial Issue tracking history for Bugzilla, Trac, SourceForge, Jira Basic Services Meta-model extractors for Java and C# (FAMIX) Change coupling, change type analyses Issue-revision linker Metrics service Composite services Evolutionary hot-spots Highly changing Code Clones and many more... Type to enter text

Facets of Software Evolution Alexandru Carol, Giacomo Ghezzi, Michael Würsch, and Harald Gall software evolution & architecture lab

software evolution & architecture lab

Metrics pyramid

Demo! software evolution & architecture lab

Mashing Up Software Analytics Data for Stakeholders Martin Brandtner and Harald Gall software evolution & architecture lab

Multiple Stakeholder Mining We need to tailor information to the information needs of stakeholders, such as developers, testers, project managers, or quality analysts study their needs beyond typical developer needs questions developers ask by Sillito et al., Kevic et al. devise prototypes to elicit that information needs, for example, SQA-Mashup for Integrating Quality Data Type to enter text

SQA-Mashup A Mashup of Software Project data commit & issue & build & test data all in mashups, integrated, easy to access however, filtered to the information needs of stakeholders Most recent paper Martin Brandtner, Emanuel Giger, Harald Gall, Supporting Continuous Integration by Mashing-Up Software Quality Information, CSMR-WCRE 2014, Antwerp, Belgium Available in Win 8 App Store http://goo.gl/zuwrvm Type to enter text

A Developer s view

A Tester s view

A project timeline

SQA Mash-up Type to enter text

Mashup pipe configuration

Conclusions software evolution & architecture lab

What s the future of SQA? Environments at your fingertips embedded in the IDE supported by a DWH backbone Software Analytics exploit evolution data reason about the data make data actionable Process support model iterative workflows define custom-made analysis portfolio Enriched interfaces Type to enter text

Workflows & Mashups