Estimating the Effort of Independent Verification and Validation in the Context of Mission-Critical Software Systems A Case Study

Similar documents
1 Basic concepts for quantitative policy analysis

Experiments with Protocols for Service Negotiation

Evaluating the statistical power of goodness-of-fit tests for health and medicine survey data

Product Innovation Risk Management based on Bayesian Decision Theory

A SIMULATION STUDY OF QUALITY INDEX IN MACHINE-COMPONF~T GROUPING

A Two-Echelon Inventory Model for Single-Vender and Multi-Buyer System Through Common Replenishment Epochs

emissions in the Indonesian manufacturing sector Rislima F. Sitompul and Anthony D. Owen

Consumption capability analysis for Micro-blog users based on data mining

Logistics Management. Where We Are Now CHAPTER ELEVEN. Measurement. Organizational. Sustainability. Management. Globalization. Culture/Ethics Change

RELATIONSHIP BETWEEN BUSINESS STRATEGIES FOLLOWED BY SERVICE ORGANIZATIONS AND THEIR PERFORMANCE MEASUREMENT APPROACH

International Trade and California Employment: Some Statistical Tests

Prediction algorithm for users Retweet Times

A NOVEL PARTICLE SWARM OPTIMIZATION APPROACH FOR SOFTWARE EFFORT ESTIMATION

A Group Decision Making Method for Determining the Importance of Customer Needs Based on Customer- Oriented Approach

Analyses Based on Combining Similar Information from Multiple Surveys

VOL. 2, NO. 7, August 2012 ISSN ARPN Journal of Science and Technology All rights reserved.

Honorable Kim Dunning Presiding Judge of the Superior Court 700 Civic Center Drive West Santa Ana, CA 92701

Supplier selection and evaluation using multicriteria decision analysis

EVALUATING THE PERFORMANCE OF SUPPLY CHAIN SIMULATIONS WITH TRADEOFFS BETWEEN MULITPLE OBJECTIVES. Pattita Suwanruji S. T. Enns

Finite Element Analysis and Optimization for the Multi- Stage Deep Drawing of Molybdenum Sheet

Extended Abstract for WISE 2005: Workshop on Information Systems and Economics

Volume 30, Issue 4. Who likes circus animals?

LIFE CYCLE ENVIRONMENTAL IMPACTS ASSESSMENT FOR RESIDENTIAL BUILDINGS IN CHINA

A Multi-Product Reverse Logistics Model for Third Party Logistics

Guidelines on Disclosure of CO 2 Emissions from Transportation & Distribution

MEASURING USER S PERCEPTION AND OPINION OF SOFTWARE QUALITY

COMMUNICATING PERFORMANCE ASSESSMENTS OF INTERMEDIATE BUILDING DESIGN STATES. Godfried Augenbroe* and Sten de Wit**

Study on Productive Process Model Basic Oxygen Furnace Steelmaking Based on RBF Neural Network

Innovation in Portugal:

Customer segmentation, return and risk management: An emprical analysis based on BP neural network

Modified-LOPA; a Pre-Processing Approach for Nuclear Power Plants Safety Assessment

MULTIPLE FACILITY LOCATION ANALYSIS PROBLEM WITH WEIGHTED EUCLIDEAN DISTANCE. Dileep R. Sule and Anuj A. Davalbhakta Louisiana Tech University

Evaluation Method for Enterprises EPR Project Risks

Experimental Validation of a Suspension Rig for Analyzing Road-induced Noise

Modeling and Simulation for a Fossil Power Plant

Field Burning of Crop Residues

Calculation and Prediction of Energy Consumption for Highway Transportation

Planning of work schedules for toll booth collectors

Sources of information

K vary over their feasible values. This allows

Maximizing the Validity of a Test as a Function of Subtest Lengths for a Fixed Total Testing Time: A Comparison Between Two Methods

Econometric Methods for Estimating ENERGY STAR Impacts in the Commercial Building Sector

Foundation design reliability issues

Management of innovation processes at the enterprises of the construction materials industry

Evaluating The Performance Of Refrigerant Flow Distributors

EVALUATION METHODOLOGY OF BUS RAPID TRANSIT (BRT) OPERATION

Development and production of an Aggregated SPPI. Final Technical Implementation Report

DEVELOPMENT OF A MODEL FOR EVALUATING THE EFFECTIVENESS OF ACCOUNTING INFORMATION SYSTEMS

Identifying Factors that Affect the Downtime of a Production Process

ON LINKAGE-BASED CLUSTERING APPROACH AND AIR TRAFFIC PATTERN RECOGNITION

A Similarity-Based Approach for the All-Time Demand Prediction of New Automotive Spare Parts

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

FIN DESIGN FOR FIN-AND-TUBE HEAT EXCHANGER WITH MICROGROOVE SMALL DIAMETER TUBES FOR AIR CONDITIONER

Application of Ant colony Algorithm in Cloud Resource Scheduling Based on Three Constraint Conditions

Implementing Activity-Based Modeling Approach for Analyzing Rail Passengers Travel Behavior

FPGA Device and Architecture Evaluation Considering Process Variations

SIMULATION RESULTS ON BUFFER ALLOCATION IN A CONTINUOUS FLOW TRANSFER LINE WITH THREE UNRELIABLE MACHINES

MONITORING THE EFFICIENCY OF USE OF OPERATING ROOM TIME WITH CUSUM CHARTS

Construction of Control Chart Based on Six Sigma Initiatives for Regression

Evaluation of Quality Management Performance in Office of President using Modified Public Sector Management Quality Award (PMQA) Model

Key Words: dairy; profitability; rbst; recombinant bovine Somatotropin.

The Application of Uninorms in Importance-Performance Analysis

THE STUDY OF GLOBAL LAND SUITABILITY EVALUATION: A CASE OF POTENTIAL PRODUCTIVITY ESTIMATION FOR WHEAT

The link between immigration and trade in Spain

ASSESSMENT OF THE IMPACT OF DECAY CORRECTION IN THE DOSE-TO- CURIE METHOD FOR LONG-TERM STORED RADIOACTIVE WASTE DRUMS

Determination of the Relationship between Biodiversity and the Trophic State of Wahiawa Reservoir, Phase II

Volume 29, Issue 2. How do firms interpret a job loss? Evidence from the National Longitudinal Survey of Youth

Using Economic Considerations to Choose Among Architecture Design Alternatives

Study on trade-off of time-cost-quality in construction project based on BIM XU Yongge 1, a, Wei Ya 1, b

Process Approach and Modelling in Organisation Competitiveness Management System

An Empirical Study about the Marketization Degree of Labor Market from the Perspective of Wage Determination Mechanism

Using Semi-Empirical Models to Design Timber Platform Frame for Stability

Decision-centric Active Learning of Binary-Outcome Models

Decision-making Model to Generate Novel Emergency Response Plans. for Improving Coordination during Large-scale Emergencies

Simulation of Steady-State and Dynamic Behaviour of a Plate Heat Exchanger

Model Development of a Membrane Gas Permeation Unit for the Separation of Hydrogen and Carbon Dioxide

Evaluating Clustering Methods for Multi-Echelon (r,q) Policy Setting

Program Phase and Runtime Distribution-Aware Online DVFS for Combined Vdd/Vbb Scaling

TRAFFIC SIGNAL CONTROL FOR REDUCING VEHICLE CARBON DIOXIDE EMISSIONS ON AN URBAN ROAD NETWORK

Trust-Based Working Time and Organizational Performance: Evidence from German Establishment-Level Panel Data. December 2011

A MODEL TO MEASURE THE PERFORMANCE OF HUMAN RESOURCES IN ORGANISATIONS

Self Selection and Information Role of Online Product Reviews

WISE 2004 Extended Abstract

Driving Factors of SO 2 Emissions in 13 Cities, Jiangsu, China

Active Learning for Decision-Making

A Hybrid Intelligent Learning Algorithm in MAS

GETTING STARTED CASH & EXPENSE PLANNING

ISSN An Optimization Model of the Business Applications Selection Process. Cahier du GReSI no Décembre 2005

Development of a Tool Management System for Energy Sector Company

A Scenario-Based Objective Function for an M/M/K Queuing Model with Priority (A Case Study in the Gear Box Production Factory)

High impact force attenuation of reinforced concrete systems

The Impact of Information Quality of Job Descriptions on an Applicant s Decision to Pursue a Job

Significant Requirements Engineering Practices for Software Development Outsourcing

The Study on Evaluation Module Architecture of ERP for Chemical Enterprises Yongbin Qin 1, 2, a, Jiayin Wei 1, b

THE DRONE SQUAD 10 SMARTER OPERATIONS REMOTE INSPECTIONS

Bayesian-LOPA Methodology Development for LNG Industry

Estimating Peak Load of Distribution Transformers

Research on Interactive Design Based on Artificial Intelligence

Robustness theoretical framework

Risk Assessment Using AHP in South Indian Construction Companies: A Case Study

Transcription:

Estmatng the Effort of Independent Verfcaton and Valdaton n the Context of Msson-Crtcal Software Systems A Case Study Haruka Nakao a, Adam Trendowcz b, Jürgen Münch b a Japan Manned Space Systems Corporaton, Ibarak, Japan b Fraunhofer Insttute for Expermental Software Engneerng, Kaserslautern, Germany haruka@jamss.co.jp, {Adam.Trendowcz, Juergen.Muench}@ese.fraunhofer.de Abstract The ablty to generate suffcently accurate effort estmates can be seen as a key success factor for multorganzatonal projects focusng on the development of large and crtcal software systems. Ths s caused, for nstance, by the need for synchronzng multple development and verfcaton and valdaton processes. Paradoxcally, effort predctons n the crtcal software systems doman are stll relyng on human judgment. Ths requres much overhead and ts relablty depends largely on the expertse and ndvdual preferences of the nvolved experts. In partcular, verfcaton and valdaton by ndependent entty (IV&V) needs an estmaton method that supports negotatng and managng IV&V costs n the context of sparse measurement data and low avalablty of doman experts. In order to address these problems, n ths paper we propose applyng a hybrd effort estmaton method called Co- BRA for estmatng effort for IV&V of msson-crtcal software systems. When appled n an ndustral context, CoBRA mproved estmaton accuracy and precson by about 40%, on average, compared to experts estmates and OLS regresson. 1. Introducton The average company spends about 4 to 5 percent of ts revenue on nformaton technology, wth those that are hghly IT-dependent - such as fnancal and telecommuncatons companes - spendng more than 10 percent on t [5]. Now, a great part of those nvestments s wasted because software organzatons are stll proposng unrealstc software costs, work wthn tght schedules, and, n consequence, fnsh ther projects behnd schedule and budget (about 50% of projects), or do not complete them at all (more than 25% of projects). Moreover, even though projects are completed wthn a target plan, the functonalty and qualty of products delvered are usually cut to ft ths plan [11]. Ths ndcates that software project plannng s a crtcal success factor of a software project. Project plannng n the safety-crtcal doman s partcularly mportant and dffcult at the same tme. Large functonal constrant, hgh qualty requrements, and nvolvement of several ndependent partes make t much more challengng to plan msson-crtcal projects than to plan ordnary, non-crtcal software development projects. In that context, effectve synchronzng actvtes of all nvolved partes are a key factor for project success. One example actvty requrng such synchronzaton s verfcaton and valdaton done by an ndependent organzaton, or ndependent verfcaton and valdaton (IV&V). A msmatch between the IV&V plan and the overall project plan may lead to sgnfcant delays or (n extreme cases) to the skppng of certan IV&V actvtes. In consequence, the whole msson mght be exposed to the hgh rsk of a sgnfcant loss of money, and, n the worst case, njury or death of people. Yet, comprehensve support for plannng and managng IV&V s mssng. Effort estmaton approaches proposed by the research communty have tradtonally focused on plannng and trackng classcal, n-house software development. Effort estmaton methods that grew upon those objectves focus on provdng exact estmates. They do not, however, support an easly understandable, systematc and relable analyss of the most relevant causal effort dependences. Even though an accurate predcton s provded, software practtoners have hardly any support to prevent potental project overruns. In the short-term perspectve, ths would mean a lack of a sold bass for effectvely mtgatng project rsks, and n the long-term perspectve, a lmted ablty to dentfy process mprovement areas and to learn. Moreover, estmaton methods promoted by the research communty requre large data sets, whereas methods commonly employed by ndustry extensvely nvolve doman experts. All those aspects sgnfcantly reduce the applcablty of exstng estmaton methods n the IV&V context, where relable and comprehensve project management has to be provded despte the mnmal avalablty of quanttatve data and human expertse. In ths paper, we propose applyng the Cost Estmaton, Benchmarkng, and Rsk Analyss method (CoBRA ) [12][17] to estmate the effort of the IV&V of mssoncrtcal software systems. CoBRA s a hybrd method that combnes analytcal and expert-based estmaton. It provdes a systematc way to transform varous sources of organzatonal knowledge (mnmal set of measurement data and expert judgment) nto a transparent and reusable effort model that supports achevement of a varety of project management objectves, such as rsk management or nego-

tatng of project costs. The remander of the paper s organzed as follows: Secton 2 brefly characterzes the IV&V context. Secton 3 gves an overvew of exstng software effort estmaton methods, followed by a more detaled descrpton of the CoBRA method n Secton 4. Secton 5 presents the emprcal results of an ndustral applcaton of CoBRA for plannng the effort of IV&V, followed by lessons learned (Secton 6) n the study. The paper ends up wth a bref summary and conclusons gven n Secton 7. 2. Independent Verfcaton and Valdaton 2.1. Characterstcs of IV&V Independent verfcaton and valdaton (IV&V) can be defned as a process where software work products generated by a development team are verfed and valdated by a completely ndependent organzatonal entty. Independence s consdered here [7] n terms of techncal, manageral, and fnancal ndependence. IV&V s typcally appled n the context of safety- or msson-crtcal software systems, such as space and nuclear plant systems. The typcal constrant of IV&V, as compared to classcal n-house V&V, s lmted nformaton on processed artfacts. On the one hand, there s lmted knowledge about the software development envronment; on the other hand, IV&V has to handle varous types of msson-crtcal systems. Ths varety does not allow for collectng many hstorcal project data. Moreover, nvolvement of three stes (customer-, development-, and IV&V-entty) n the software development process contrbutes to frequent and unpredctable requrements change. In that context, managng an IV&V project s resources s crtcal and dffcult at the same tme. 2.2. Objectves of Effort Estmaton Besdes tradtonal estmaton objectves such as precse plannng and trackng software resources, project needs decson-makng support. In partcular, explct dentfcaton of factors havng the greatest mpact on IV&V cost and ther quanttatve mpact should be supported. Identfcaton of customer-specfc factors (e.g., level of customer support) may be used to justfy and negotate IV&V costs that cannot be nfluenced by the IV&V suppler. On the other hand, dentfcaton of the IV&V suppler s characterstcs (process and human capabltes) that have the greatest mpact on ncreased IV&V costs wll allow targetng mprovement actons to specfc process areas and mprovng the effcency of IV&V. In consequence, project rsks can be mtgated tmely and crtcal organzatonal processes can be mproved. 2.3. Current Effort Estmaton Practces A survey about current estmaton practces at Japan Manned Space Systems Corporaton (JAMSS) revealed that measurement data from around 10 already completed projects have been collected wthn the past 10 years. However, the data suffered from sgnfcant ncompleteness (around 20% of mssng data) and large varablty due to the hgh unqueness of the consdered projects. Snce hardly any data-drven estmaton method that would meet the estmaton objectves (Secton 2.2) can be appled reasonably, estmates are typcally based on the judgment of one or more doman experts. Although sparse project data are avalable, experts based ther estmates solely on personal experences. One of the reasons s that the avalable smple sze measures, such as pages of software requrements document, are beleved not to reflect the amount of IV&V effort relably. Yet, expert-based estmaton dd not provde satsfactory support for project management. Frst, the relablty of the estmates depends largely on ndvdual expertse and preferences of nvolved doman expert. In consequence effort estmates are not accurate and vary wdely across projects (see Table 4 and Table 5 n Secton 5.5). Moreover, estmaton costs much effort each tme t s performed, and snce t does not provde any explct effort model, t hardly supports decson makng n a project. 3. Related Work Numerous types of estmaton methods have been developed over the last decades. In ths secton we provde a bref overvew of exstng estmaton methods from the vewpont of ther applcablty n the context of IV&V. For a comprehensve revew and comparatve evaluaton of exstng methods, please refer to [16]. Exstng effort estmaton methods dffer bascally wth respect to the type of nputs they requre and the form of the estmaton model they do provde. Wth respect to nput data, we dfferentate between three major groups: datantensve, expert-based, and hybrd methods (combnng avalable data and expert knowledge n order to come up wth estmates). Among the data-ntensve methods, some requre past project data for buldng customzed models (defne-your-own-model approaches), others provde an already defned model, where factors and ther relatonshps are fxed based on a set of mult-organzatonal project data (fxed-model approaches). The major advantage of fxedmodel approaches s that they, theoretcally, do not requre any hstorcal data to be appled. Those methods mght be especally attractve n the IV&V context, where very sparse (f any) data are typcally avalable. Yet, n practce, fxed models, such as COCOMO [2][1], are developed for a specfc context (typcally dfferent from IV&V) and are, by defnton, only suted for estmatng the types of projects for whch the fxed model was bult. The applcablty of such models for the IV&V context s, n practce, very lmted. In order to mprove ther performance, a sgnfcant amount of organzaton-specfc project data would be requred for calbratng the generc model. In that case, the

lttle usefulness of the fxed-model approaches for IV&V effort estmaton would not dffer much from the defneyour-own-model approaches, whch requre a sgnfcant amount of context-specfc data to buld customzed effort models [15]. Applcaton of the defne-your-own-model methods n the context of IV&V s further lmted by the addtonal requrements of specfc methods. Parametrc approaches, such as regresson [14], for nstance, make several assumptons about underlyng project data (completeness, normal dstrbuton, etc.) that are rarely met n the software doman. Non-parametrc methods orgnatng from the machne learnng doman, such as artfcal neural networks (ANN) [3] or Decson Trees/rules [15], make practcally no assumptons about the data but are qute senstve to ther parameter confguraton and there s usually lttle unversal gudance regardng how to set those parameters. Thus, fndng approprate parameter values requres some prelmnary expermentaton. In contrast to data-ntensve methods, expert-based estmaton does not requre any project measurement data because estmates are based on the judgment of one or more human experts [2]. Expert estmaton s, n fact, commonly used n the software ndustry (ncludng IV&V). It does, however, have several sgnfcant lmtatons. Frst, much effort s requred each tme estmaton s performed, and the relablty of the outputs t provdes largely depend on the expertse and ndvdual preferences of the human experts nvolved. Moreover, snce the ratonale underlyng fnal estmates s not modeled explctly, there s hardly any support for effectve decson makng n a project (rsk management, process mprovement, project scope negotatons, etc.). Recently, a few hybrd methods have been proposed to cope wth defcts of data-ntensve and expertbased estmaton. They combne a reduced amount of both measurement data and human expertse to provde more relable estmates wth lmted estmaton overhead. Emprcal applcatons [17][10] report on ther hgher estmaton accuracy and stablty when compared to data- or expert-based methods. Moreover, methods that employ explct causal effort modelng (e.g., CoBRA [17]) have proven to greatly contrbute to the achevement of a varety of organzatonal objectves, such as rsk management or process/productvty mprovement. 4. The CoBRA Method CoBRA [12][15] s a hybrd method combnng dataand expert-based effort estmaton approaches. CoBRA the method s based on the dea that project effort conssts of two basc components: nomnal project effort and an effort overhead porton as presented below. Nomnal effort s the effort spent only on developng a software product of a certan sze n the context of a hypothetcal deal project that runs under optmal condtons;.e., all project characterstcs are the best possble ones ( perfect ) at the start of the project. Effort overhead s the addtonal effort spent on overcomng the mperfectons of a real project envronment, such as nsuffcent sklls of the project team. In ths case, a certan effort s requred to compensate for such a stuaton, e.g., team tranng has to be conducted. In CoBRA, effort overhead s modeled by a causal effort model that conssts of factors affectng project effort wthn a certan context. The causal model s obtaned through expert knowledge acquston (e.g., nvolvng experenced project managers). Effort = Nomnal Productvty Sze + Effort Overhead 14444 244444 3 (1) Effort Overhead = + j Nomnal Effort j Multpler (Effort Factor ) Multpler (Effort Factor, Indrect Effort Factor ) An example s presented n Fgure 1. The sold and dashed arrows ndcate drect and ndrect relatonshps, respectvely. For nstance, Requrements volatlty has a drect mpact on development effort. The strength of ths negatve nfluence on effort may, however, be modfed (compensated) by Dscplned requrement management (ndrect nfluence). The effort overhead porton resultng from ndrect nfluences s represented by the second component of the sum shown n (2). Fgure 1: Example of a Causal Effort Model The nfluence on effort and between dfferent factors s quantfed for each factor usng experts evaluaton. The nfluence s measured by means of effort overhead,.e., a relatve percentage ncrease of the effort above the nomnal project. In order to capture the uncertanty of evaluatons, experts are asked to gve three values: the maxmal, mnmal, and most lkely cost overhead for each factor (trangular dstrbuton). The second component of CoBRA, the nomnal project effort, s based on data from past projects that are smlar wth respect to certan characterstcs (e.g., development type, lfe cycle type) that are not part of the causal model. These characterstcs defne the context of the project. Past project data s used to determne the relatonshp between cost overhead and costs (see equaton #1). Snce t s a smple bvarate dependency, t does not requre much measurement data. In prncple, merely project sze and effort are requred, whereby both can be measures usng any vald metrc representng project sze and effort. j (2)

Based on the quantfed causal model, past project data, and current project characterstcs, an effort overhead model s generated usng a smulaton algorthm (e.g., Monte Carlo). The probablty dstrbuton obtaned could be used further to support varous project management actvtes, such as effort estmaton, evaluaton of effortrelated project rsks, or benchmarkng. More detals regardng the CoBRA method can be found n [12][15]. 5. Case Study 5.1. Context of the Study The study was performed n the context of JAMSS, a company that performs IV&V of space software systems (embedded software doman). JAMSS has been, for nstance, supportng IV&V for crtcal space software systems created by the Japan Aerospace Exploraton Agency (JAXA) for more than 10 years. In ths study, we focused on IV&V of the software requrements specfcaton documents usng the document revew technque. The document revew process starts wth a rsk analyss to dentfy a software system s operatonal rsks. Software requrements are then revewed n more detal based on ther operatonal rsks wth respect to one or more revew objectves. In prncple, there were sx revew objectves [8] (Table 2): (O1) rsk analyss, (O2) state transton completeness and consstency, (O3) desgn completeness for exceptonal behavor, (O4) tmng correctness and consstency, (O5) nterface correctness and consstency, and (O6) traceablty. There were three doman experts nvolved n the study (Table 1) who provded ther knowledge to buld the effort overhead model. The man felds of expertse covered by nvolved experts ncluded: software product qualty & safety assurance (SPQSA), software safety revews (SR), and safety assurance n operaton (SAO). Table 1. Involved doman experts Expert Expertse Doman experence [#years] Estmaton experence [#projects] 1 SR 7 8 2 SPQSA 8 9 3 SAO 4 6 As project measurement data, the number of document pages was selected as the sze of a software requrement because even f the complexty of a requrement complexty s related to the effort for revewng a document, the document tself has to be read by the IV&V team n order to fnd out what ths complexty s. IV&V effort data from fve projects were collected for each project. In practce, because some IV&V objectves were not addressed, effort data were not collected for each IV&V objectve except for one project. Therefore, weekly workng statuses of IV&V were used to abstract the effort for each IV&V objectve. Measurement data avalable for the estmaton ncluded sze and effort. Sze was measured n pages of software requrements for objectves O1 to O5 and system specfcaton (software and hardware) for objectve O6 addtonally. The effort was measured n persondays (PD). Table 2. Revew objectves consdered n the study Id Objectve #projects O1 Rsk analyss 5 O2 State transton completeness/consstency 5 O3 Desgn completeness 5 O4 Interface completeness/consstency 4 O5 Tmng consstency/correctness 3 O6 Traceablty wth correctness 5 5.2. Study Objectves The objectve of the study was to valdate accuracy and precson of CoBRA n the context of JAMSS IV&V (compared to expert judgment and Ordnary Least Squares method) and ts contrbuton to the achevement of defned organzatonal objectves (Secton 2.2). 5.3. Study Desgn 5.3.1. Effort Estmaton Procedure Motvated by ts numerous benefts, the CoBRA Method was proposed as best fttng the effort estmaton capabltes and objectves of JAMSS. Frst of all, CoBRA proposes a systematc way to buld an explct and reusable effort model based on both mplct knowledge of doman experts and sparse measurement data. Moreover, t provdes on the output a transparent and ntutve model of causal effort dependences specfc for the context where t was appled. The frst step of the effort estmaton procedure ncluded development of the CoBRA model usng the knowledge of the nvolved doman experts and measurement data (sze and effort) from already completed (hstorcal) projects. For each of the sx IV&V objectves specfed n the study (Table 2), a separate CoBRA model was developed. After the CoBRA models had been created, each was valdated on the hstorcal data n a leave-one-out cross-valdaton experment. 5.3.2. Study Hypotheses In order to effectvely support achevement of the estmaton objectves, the outputs of CoBRA need to be relable. In our study, we evaluate relablty byvaldatng the predctve performance of the estmaton outputs, measured n terms of predctve accuracy and precson. We expect that CoBRA wll outperform the currently employed expertbased estmaton as well as the Ordnary Least Squares method (OLS), one of a few data-drven methods that are applcable n the study context (due to very sparse measurement data). Ths leads us to two study hypotheses: H1. CoBRA provdes more accurate and more precse estmates than estmaton based on expert judgment. H2. CoBRA provdes more accurate and more precse estmates than estmaton based on OLS.

5.3.3. Evaluaton of Estmaton Performance The effort models created n the study effort models were evaluated wth respect to ther predctve performance. We defne predctve performance as the ablty of the effort model to provde accurate and precse estmates. Estmaton accuracy refers to the nearness of an estmate (Ê) to the true value (E). In order to reman comparable to other estmaton studes, we use common estmaton error measures and accuracy measures [4], such as relatve error (RE n equaton #3) and mean magntude of relatve error (MMRE). ( E ˆ E ) E RE / = (3) The Conte s RE and MRE measures are the subject of common crtcsm n the software research communty [6]. One of the alternatve measures of estmaton error proposed s the so-called z measure (equaton #4) [6]. It quantfes the rato of the estmate to the actual value z = E / Eˆ (4) Estmaton precson refers to the degree to whch several estmates are very close to each other (.e., the scatter n the data). For the purpose of comparablty to other studes, we adopt the Pred.m measure. The Pred.m measures the percentage of estmates that are wthn m% of MRE [4]. In our study, we use m = 25% as typcally employed n software estmaton studes. Moreover, we adopt relatve standard devaton (RSD) (5) proposed by [6] for software effort estmaton as uncorrelated wth sze (S ) (whch s a weakness of classcal standard devaton measures). RSD = n = 1 [( E Eˆ )/ S ] n 1 5.4. Study Executon Durng the study executon, sx CoBRA models were created for each of the IV&V objectves.for each model, doman experts dentfed several factors (Table 3) that are responsble for the varance of IV&V effcency for a certan objectve (Fgure 2). 5.5. Results and Interpretaton Ths secton presents the results of the emprcal study. Table 4 and Table 5 present the aggregated measures of the predctve performance. In order to test the sgnfcance of the observed effects, approprate statstcal test were performed [13] (at α = 0.05). The results of a Shapro-Wlk W test ndcated that the MRE and z results come from normal populaton; n that case, a parametrc Pared T-test for homogenety of means was used. Snce the RSD data volated the normalty assumpton, a non-parametrc Wlcoxon Matched-Pars Sgned Rank test was used. Note that expert estmates were avalable for a subset of the past projects consdered (ndcated as n n Table 4 and Table 5). Fnally, as we were afrad that for such a small data sample, statstcal tests would not have enough power (β-1 80%), we 2 (5) performed a power analyss. Fgure 2. Overall effcency of IV&V Table 3. Effort factors consdered n the study Factors nfluencng IV&V effcency Doman experence of the IV&V team Requrements volatlty allowed wthn an ntal contract Novelty of appled IV&V technque Number of system s nterfaces to other (sub)systems Tme pressure n the last IV&V phase Level of rsk assessment done by a suppler or customer Fault Tree Analyss done by IV&V company Tmng consstency objectve ncluded n IV&V Feld Programmable Gate Array (FPGA) revew performed New (nexperenced) personnel nvolved n IV&V Objectves O1-O3 O2-O4, O6 O3 O4, O6 O5 O1 O1 O5 O5 O1 5.5.1. Hypothess H1 The results of the emprcal nvestgaton (Table 4) suggest that, n prncple, CoBRA provdes notceably more accurate estmates than ether expert judgment or OLS for all consdered IV&V revew objectves (a few exceptons are marked n gray). For example, t mproves MMRE by 70% and 40%, on average, compared to OLS and doman experts, respectvely. Only few of the obtaned results are statstcally sgnfcant at the chosen α level (marked n bold). As expected, n most of the cases, statstcal sgnfcance testng dd not provde meanngful results (β-1 << 80%). The only powerful results are marked n talcs. Summarzng, we conclude that hypothess H1 s vald. Table 4. Effort estmaton accuracy CoBRA OLS Expert Obj. MMRE Mean z MMRE Mean z MMRE Mean z n O1 18.2% 97.9% 60.0% 44.1% 42.7% 57.3% 2 O2 25.4% 96.6% 34.0% 66.0% 37.1% 116.5% 4 O3 22.4% 101.% 32.1% 73.0% 36.0% 64.0% 3 O4 24.1% 98.6% 33.2% 66.8% 44.4% 55.6% 3 O5 39.6% 105.8% 46.3% 63.0% 72.2% 94.4% 3 O6 24.5% 93.1% 44.5% 55.5% 13.8% 88.8% 2 5.5.2. Hypothess H2 The analyss of estmaton precson (Table 5) suggests that, n prncpal, CoBRA notceably outperforms both expert judgment and OLS for all consdered IV&V revew objectves (a few exceptons are marked n gray). For example, t reduces RSD by 54% and 40%, on average compared to OLS and doman experts, respectvely. Smlar to accuracy, n most of the cases, statstcal sgnfcance test-

ng dd not provde meanngful results (β-1 << 80%). Summarzng, we conclude that hypothess H2 s vald. Table 5. Effort estmaton precson CoBRA OLS Expert Obj. Pred.25 RSD Pred.25 RSD Pred.25 RSD n O1 80.0% 26.3% 20.0% 74.0% 00.0% 71.3% 2 O2 60.0% 10.7% 60.0% 17.6% 25.5% 12.6% 4 O3 60.0% 15.1% 60.0% 42.1% 33.3% 22.6% 3 O4 50.0% 09.9% 50.0% 20.6% 00.0% 17.3% 3 O5 00.0% 07.7% 33.3% 12.5% 33.3% 15.0% 3 O6 80.0% 07.8% 40.0% 28.1% 100.0% 01.9% 2 5.6. Threats to Valdty Several threats to the valdty of the presented case study were dentfed. Frst, the very sparse project measurement data avalable prevented us from achevng suffcent power of performed statstcal tests. Moreover, expert estmates used to compare CoBRA s performance were avalable only for some of the past projects consdered n the study. Fnally, the conclusons drawn n the study are lmted to the specfc context of IV&V revews at JAMSS. Generalzaton of the study fndngs requres further replcatons. 6. Lessons Learned The followng practcal lessons were learned whle applyng the CoBRA method for estmaton effort of IV&V: (LL1) Effort estmaton scope: Snce IV&V actvtes dffer dependng on the objectve of IV&V, the scope of effort estmaton (the context for whch an effort model s bult) should be lmted to a sngle IV&V objectve. Total effort s the sum of effort over all objectves. (LL2) Sze and complexty of revew: The complexty of a document under revew should be consdered as an effort drver beyond smple sze measures, such as number of document pages. (LL3) Effort drvers: Consderng effort drvers other than sze s a very mportant aspect of effort modelng. We experenced that a sngle factor may multply effort by as much as 10 tmes (e.g., a complete lack of rsk assessment already done by a software suppler may ncrease the effort of ndependent rsk analyss by up to 20 tmes). Such an effect s mpossble to nvestgate based only on hstorcal sze and effort data. 7. Summary In ths paper, we proposed adaptng the CoBRA software estmaton method to predct the effort of ndependent verfcaton and valdaton (IV&V). The method provdes a potental soluton to estmaton problems n the context of IV&V. By ntegratng data- and expert-based estmaton, CoBRA requres mnmal amounts of project measurement data and reduces the nvolvement of doman experts. As a result, t provdes a reusable model that supports strategc project/process objectves, such as rsk management for effort overrun for each IV&V objectves. At the same tme, as reported by several emprcal studes, t provdes accurate and precse estmates. When appled n the context of an example IV&V organzaton, CoBRA proved to provde more relable estmates than both the expert-based estmaton currently appled and ordnary regresson (OLS) - one of few datantensve methods applcable n the IV&V context. It mproved the accuracy and precson of estmates by 40%, on average. At the same tme the method provded a transparent, context-specfc effort model that supported IV&V practtoners n achevng project and process management objectves (e.g., negotatng project scope or mprovng the effectveness of IV&V actvtes). References [1] B.W. Boehm, C. Abts, A.W. Brown, S. Chulan, B.K. Clark, E. Horowtz, R. Madachy, D. Refer, and Steece B. Software Cost Estmaton wth COCOMO II, Prentce Hall, 2000. [2] B.W. Boehm, Software Engneerng Economcs, Prentce-Hall, 1981. [3] G. Boettcher, An Assessment of Metrc Contrbuton n the Constructon of a Neural Network-Based Effort Estmator, Proc. Int l Workshop Soft Computng Appled to Soft. Eng., 2001, pp. 59-65. [4] S.D. Conte, H.E. Dunsmore, V.Y. Shen, Software Engneerng Metrcs and Models. The Benjamn-Cummngs Publshng Company, Inc. 1986. [5] R.N. Charette, Why Software Fals [Software Falure], IEEE Spectrum, vol. 32, no. 9, Sept. 2005, pp. 42-49. [6] T. Foss, E. Stensrud, B. Ktchenham, I. Myrtvet, A smulaton study of the model evaluaton crteron MMRE, IEEE Trans. Soft. Eng., vol. 29, no. 11, Nov. 2003, pp. 985-995. [7] IEEE Std. 1012-2004, IEEE Standard for Software Verfcaton and Valdaton, IEEE computer Socety, June, 2005. [8] N. Kohtake, A. Katoh, N. Ishhama, Y. Myamoto, T. Kawasak, M Katahra, Software Independent Verfcaton and Valdaton for Spacecraft Proc. IEEE JAXA Aerospace Conference, 2008. [9] M. Lother, R. Dumke, Pont Metrcs. Comparson and Analyss, n Current Trends n Software Measurement (ed. R. Dumke, A. Abran), Shaker Publ., 2001, pp. 228-267. [10] E. Mendes, A Comparson of Technques for Web Effort Estmaton, Proc. Int l Symp. Emp. Soft. Eng. & Meas., Madrd, Span, 20-21 Sept., 2007, pp. 334-343. [11] T. Menzes, J. Hhn, Evdence-based Cost Estmaton for Better Qualty Software, IEEE Software, vol. 23, no. 4, 2006, pp. 4-6. [12] M. Ruhe, R. Jeffery, I. Weczorek, Cost Estmaton for Web Applcatons, Proc. Int l Conf. Soft. Eng., 2003, pp. 285-294. [13] D.J. Sheskn, Handbook of Parametrc and Nonparametrc Statstcal Procedure, (3 rd ed.), Chapman & Hall/CRC; 2003. [14] P. Sentas, L. Angels, I. Stamelos, G.L. Blers,. Software Productvty and Effort Predcton wth Ordnal Regresson, J. Inf. & Soft. Tech., vol. 47, no. 1, Jan. 2005, pp. 17-29. [15] Q. Song, M. Shepperd, M. Cartwrght, C. Mar, Software Defect Assocaton Mnng and Defect Correcton Effort Predcton, IEEE Trans. Soft. Eng., vol. 32, no. 2, 2006, pp. 69-82. [16] A. Trendowcz, Software Effort Estmaton Overvew of Current Industral Practces and Exstng Methods, Tech. Rep. 06.08/E, Fraunhofer IESE, Kaserslautern, Germany, 2008. [17] A. Trendowcz, J. Hedrch, J. Münch, Y. Ishga, K. Yokoyama, N. Kkuch, Development of a Hybrd Cost Estmaton Model n an Iteratve Manner, Proc. Int l Conf. Soft. Eng., 2003.