Introduction to Software Testing

Introduction Chapter 1 introduces software testing by : describing the activities of a test engineer defining a number of key terms explaining the central notion of test coverage Software is a key ingredient in many of the devices and systems: network routers financial networks web embedded applications: Airplanes, Spaceships, Traffic control systems mundane appliances watches, ovens, cars, DVD players 2

Introduction Goal of this course: present some few basic software testing concepts that can be used to design tests for a large variety of software applications. software engineer can easily apply them to any software testing situation. simplify testing by classifying coverage criteria into four categories: Graphs logical expressions input domain characterizations syntactic descriptions 3

Activities Of A Test Engineer Test engineer : Is an information technology (IT) professional Is in charge of one or more technical test activities including: designing test inputs Producing test case values running test scripts analyzing results reporting results to developers and managers Every engineer involved in software development should realize that he or she sometimes wears the hat of a test engineer. 4

Activities Of A Test Engineer Test Manager : Is in charge of one or more test engineers. Set test policies and processes, interact with other managers. 5

Activities Of A Test Engineer One of a test engineer s most powerful tools is a formal coverage criterion. Formal coverage criteria give test engineers ways to: decide what test inputs to use during testing. making it more likely that the tester will find problems in the program. Coverage criteria provide stopping rules for the test engineers. Software testing activities have been categorized into levels. Two kinds of levels have been used. 6

Activities Of A Test Engineer The most often used level categorization is based on traditional software process steps. Although most types of tests can only be run after some part of the software is implemented, tests can be designed and constructed during all software development steps. The most time-consuming parts of testing: Test design Test construction The second-level categorization is based on the attitude and thinking of the testers. 7

Testing Levels Based on Software Activity Different levels of testing: Acceptance Testing: assess software with respect to requirements. System Testing: assess software with respect to architectural design. Integration Testing: assess software with respect to subsystem design. Module Testing: assess software with respect to detailed design. Unit Testing: assess software with respect to implementation. 8

Testing Levels Based on Software Activity below figure shows software development activities and testing levels (V Model) 9

Testing Levels Based on Software Activity Standard advice is to design the tests concurrently with each development activity. The reason for this advice is that : mere process of explicitly articulating tests can identify defects in design decisions that otherwise appear reasonable. Requirements Analysis phase of software development captures the customer s needs. Acceptance testing is designed to determine whether the completed software meets these needs. Acceptance testing must involve users or other individuals who have strong domain knowledge. 10

Testing Levels Based on Software Activity Architectural design phase chooses components and connectors that realize a system whose specification is intended to meet. System testing is designed to determine whether the assembled system meets its specifications. Subsystem design phase specifies the structure and behavior of subsystems Integration testing is designed to assess whether the interfaces between modules in a given subsystem have consistent assumptions and communicate correctly. 11

Testing Levels Based on Software Activity Detailed design phase determines the structure and behavior of individual modules. Module testing is designed to assess individual modules in isolation, including how the component units interact with each other. Implementation is the phase that actually produces code. Unit testing is designed to assess the units produced by the implementation. 12

Testing Levels Based on Software Activity Regression testing : a standard part of the maintenance phase of software development. is done after changes are made to the software. its purpose is to help ensure that the updated software still possesses the functionality it had before the updates. Unfortunately, the software faults that come from requirements and design mistakes are visible only through testing months or years after the original mistake. 13

Testing Levels Based on Software Activity If tests cannot be executed, the very process of defining tests can identify a significant fraction of the mistakes in requirements and design. Through techniques such as use-case analysis, test planning is becoming better integrated with requirements analysis in standard software practice. Although most of the literature emphasizes these levels in terms of when they are applied, a more important distinction is on the types of faults. 14

Testing Levels Based on Software Activity One of the best examples of the differences between unit testing and system testing can be illustrated in the context of the infamous Pentium bug: the chip gave incorrect answers to certain floating-point division calculations. The chip was slightly inaccurate for a few pairs of numbers. The Pentium bug not only illustrates the difference in testing levels, but it is also one of the best arguments for paying more attention to unit testing. 15

Testing Levels Based on Software Activity On the other hand, some faults can only be found at the system level: The launch failure of the first Ariane 5 rocket. The low-level cause was an unhandled floating-point conversion exception in an internal guidance system function. The guidance system function is correct for Ariane 4 but no one reanalyzed the software in light of the substantially different flight trajectory of Ariane 5. 16

Testing Levels Based on Software Activity Object-Oriented softwares changes the testing levels: OO software blurs the distinction between units and modules. Intramethod testing is when tests are constructed for individual methods. Intermethod testing is when pairs of methods within the same class are tested in concert. Intraclass testing is when tests are constructed for a single entire class. 17

Beizer s Testing Levels Based on Test Process Maturity This categorization of levels is based on the test process maturity level of an organization: Level 0: There s no difference between testing and debugging. Level 1: The purpose of testing is to show that the software works. Level 2: The purpose of testing is to show that the software doesn t work. Level 3: The purpose of testing is to reduce the risk of using the software. Level 4: Testing is a mental discipline that helps all IT professionals develop higher quality software. 18

Beizer s Testing Levels Based on Test Process Maturity Level 0 is the view that testing is the same as debugging. This model does not distinguish between a program s incorrect behavior and a mistake within the program, and does very little to help develop software that is reliable or safe. In Level 1 testing, the purpose is to show correctness. Suppose we run a collection of tests and find no failures. What do we know? Should we assume that we have good software or just bad tests? Test managers have no way to answer the question. they have no way to quantitatively express or evaluate their work. 19

Beizer s Testing Levels Based on Test Process Maturity In Level 2 testing, the purpose is to show failures. Testers may enjoy finding the problem, but the developers never want to find problems level 2 testing puts testers and developers into an adversarial relationship when our primary goal is to look for failures, we are still left wondering what to do if no failures are found. Level 3 testing: Testing can show the presence, but not the absence, of failures. This allows us to realize that the entire development team wants to reduce the risk of using the software. 20

Beizer s Testing Levels Based on Test Process Maturity Once the testers and developers are on the same team, an organization can progress to real Level 4 testing. Level 4 thinking defines testing as a mental discipline that increases quality. We often think that the purpose of a spell checker is to find misspelled words, but in fact, the best purpose of a spell checker is to improve our ability to spell. In the same way, level 4 testing means that the purpose of testing is to improve the ability of the developers to produce high quality software. The testers should train your developers. 21

Automation of Test Activities Software testing requires up to 50% of software development costs, and even more for safety-critical applications. revenue tasks: determining which methods are appropriate to define a given data abstraction as a Java class is a revenue task. excise tasks: compiling a Java class is a excise task. Excise tasks are candidates for automation; revenue tasks are not. Software testing probably has more excise tasks than any other aspect of software development. 22

Automation of Test Activities Automating excise tasks serves the test engineer in many ways: 1. Eliminates excise tasks eliminates drudgery, thereby making the test engineers job more satisfying. 2. Frees up time to focus on the fun and challenging parts of testing. 3. Can help eliminate errors of omission, such as failing to update all the relevant files with the new set of expected results. 4. Eliminates some of the variance in test quality caused by differences in individual s abilities. 23

Software Testing Limitations and Terminology One of the most important limitations of software testing is that testing can show only the presence of failures, not their absence. Some terms that are important in software testing: Validation evaluating software at the end of software development. usually depends on domain knowledge. Verification determining whether the products of a given phase fulfill the requirements. is usually a more technical activity IV&V (independent verification and validation) : evaluation is done by non developers. 24

Software Testing Limitations and Terminology Software Fault A static defect in the software Faults in software are design mistakes. Software Error An incorrect internal state that is the manifestation of some fault. Software Failure External, incorrect behavior with respect to the requirements or other description of the expected behavior. 25

Software Testing Limitations and Terminology Example for fault, error and failure in a Java program: 26

Software Testing Limitations and Terminology The fault : it starts looking for zeroes at index 1 instead of index 0 For example, numzero ([2, 7, 0]) correctly evaluates to 1, while numzero ([0, 7, 2]) incorrectly evaluates to 0. both of these cases result in an error only the second case results in failure. First state is in error because the value of i should be zero on the first iteration. However, since the value of count is correct, the error state does not propagate to the output. 27

Software Testing Limitations and Terminology Testing Evaluating software by observing its execution. Test Failure Execution that results in a failure. Debugging The process of finding a fault given a failure. For a given fault, not all inputs will trigger the fault into creating incorrect output (a failure). Three conditions must be present for a failure to be observed ( RIP Model) 28

Software Testing Limitations and Terminology RIP Model 1. Reachability: The location or locations in the program that contain the fault must be reached. 2. Infection: After executing the location, the state of the program must be incorrect. 3. Propagation: The infected state must propagate to cause some output of the program to be incorrect. Test Case Values The input values necessary to complete some execution of the software under test. Expected Results The result that will be produced when executing the test if and only if the program satisfies its intended behavior. 29

Software Testing Limitations and Terminology Software Observability How easy it is to observe the behavior of a program in terms of its outputs, effects on the environment, and other hardware and software components. Software Controllability How easy it is to provide a program with the needed inputs, in terms of values, operations, and behaviors. Many O & C problems can be addressed with simulation (bypass the hardware or software components that interfere with testing) 30

Software Testing Limitations and Terminology Test Case A test case is composed of the test case values, expected results, prefix values, and postfix values necessary for a complete execution and evaluation of the software under test. Test Set A test set is simply a set of test cases. Executable Test Script A test case that is prepared in a form to be executed automatically on the test software and produce a report. 31

Software Testing Limitations and Terminology Prefix Values Any inputs necessary to put the software into the appropriate state to receive the test case values. Postfix Values Any inputs that need to be sent to the software after the test case values are sent. Postfix values can be subdivided into two types: Verification Values: Values necessary to see the results of the test case values. Exit Commands: Values needed to terminate the program or otherwise return it to a stable state. 32

Coverage Criteria For Testing Some ill-defined terms used in testing are complete testing exhaustive testing and full coverage The number of potential inputs for most programs is so large as to be effectively infinite. For Example : in a Java compiler the number of inputs to the compiler is not just all Java Programs, but all strings. Since we cannot test with all inputs, coverage criteria are used to decide which test inputs to use. From a practical perspective, coverage criteria provide useful rules for when to stop testing. 33

Coverage Criteria For Testing Test Requirement A test requirement is a specific element of a software artifact that a test case must satisfy or cover. Test requirements can be described with respect to a variety of software artifacts, including source code, design components, specification modeling elements, or descriptions of the input space. Suppose we are given the task of testing bags of jelly beans: They have the following six flavors and come in four colors. A simple approach to testing might be to test one jelly bean of each flavor. Six test requirements, one for each flavor. We satisfy the test requirement Lemon by selecting and tasting a Lemon jelly bean from a bag of jelly beans. This dilemma illustrates a classic controllability issue. 34

Coverage Criteria For Testing Coverage Criterion A coverage criterion is a rule or collection of rules that impose test requirements on a test set. Criterion describes the test requirements in a complete and unambiguous manner. Test requirements for flavor criterion : TR = {flavor = Lemon, flavor = Pistachio, flavor = Cantaloupe, flavor = Pear, flavor = Tangerine, flavor = Apricot} 35

Coverage Criteria For Testing Coverage Given a set of test requirements TR for a coverage criterion C, a test set T satisfies C if and only if for every test requirement tr in TR, at least one test t in T exists such that t satisfies tr. a test set T with 12 beans satisfies the flavor criterion. It is perfectly acceptable to satisfy a given test requirement with more than one test. It is sometimes expensive to satisfy a coverage criterion. Some requirements cannot be satisfied. for example Tangerine jelly beans are rare Drop unsatisfiable TRs from the set TR or to replace them with less stringent test requirements. 36

Coverage Criteria For Testing Coverage Level Given a set of test requirements TR and a test set T, the coverage level is simply the ratio of the number of test requirements satisfied by T to the size of TR. Infeasible: Test requirements that cannot be satisfied. Dead code results in infeasible test requirements because the statements cannot be reached. 100% coverage is impossible in practice. 37

Coverage Criteria For Testing Coverage Criteria Coverage criteria are used in one of two ways: 1. Directly generate test case values to satisfy a given criterion. o o Assumed by the research community and is the most obvious way to use criteria. Very hard in some cases: if we do not have enough automated tools to support test case value generation. 2. Generate test case values externally and measure tests against the criterion in terms of their coverage. o Assumed by the industry practitioners. 38

Coverage Criteria For Testing Coverage Criteria(etc.) If our tests do not reach 100% coverage, what does that mean? We have no data on how much, say, 99% coverage is worse than 100% coverage. Generator Procedure that automatically generates values to satisfy a criterion. Recognizer Procedure that decides whether a given set of test case values satisfies a criterion. 39

Coverage Criteria For Testing Generator vs. Recognizer In practice it is possible to recognize whether test cases satisfy a criterion more often than generating the tests. The primary problem with recognition is infeasible test requirements; if no infeasible test requirements are present, then the problem becomes decidable. The set TR depends on the specific artifact under test. TR: color = Purple doesn t make sense because we assumed that the factory does not make Purple jelly beans. 40

Coverage Criteria For Testing Criteria Subsumption A coverage criterion C1 subsumes C2 if and only if every test set that satisfies criterion C1 also satisfies C2. Note that this has to be true for every test set. For Example : Color criterion requires that we try one jelly bean of each color. If we satisfy the flavor criterion, then we have also implicitly satisfied the color criterion. 41

Infeasibility and Subsumption A subtle relationship exists between infeasibility and subsumption. Sometimes a coverage criterion C1 will subsume another C2 if we assume that C1 has no infeasible test requirements, but if C1 does create an infeasible test requirement for a program, a test suite that satisfies C1 while skipping the infeasible test requirements might also skip some test requirements from C2 that are satisfiable. 42

Characteristics of a Good Coverage Criterion 1. The difficulty of computing test requirements. 2. The difficulty of generating tests. 3. How well the tests reveal faults. Notes: Subsumption is at best a very rough way to compare criteria. If one criterion subsumes another, then it should reveal more faults. The fact that the difficulty of generating tests can be directly related to how well the tests reveal faults should not be surprising. Choosing criteria that have the right cost/benefit tradeoffs 43

Older Software Testing Terminology Black-box testing Deriving tests from external descriptions of the software, including specifications, requirements, and design. White-box testing Deriving tests from the source code internals of the software, specifically including branches, individual conditions, and statements. From an abstract perspective, black-box and white-box testing are very similar. 44

Older Software Testing Terminology Top-Down Testing Test the main procedure, then go down through procedures it calls, and so on. top-down testing is impractical. Bottom-Up Testing Test the leaves in the tree and move up to the root. Each procedure is tested only if all of its children have been tested. OO software leads to a more general problem. The relationships among classes can be formulated as general graphs with cycles, requiring test engineers to make the difficult choice of what order to test the classes in. 45

Older Software Testing Terminology Static Testing Testing without executing the program. This includes software inspections and some forms of analysis. Dynamic Testing Testing by executing the program with real inputs. testing refer to dynamic testing verification activities refer to static testing 46