Software Testing Fundamentals

Sunday, December 9, 2007

Data Analysis Techniques

Testing Data Input by Users (the GUI)

Most of the data testing we do these days is user input, and that is what we concentrate on in this book. I have included one example about testing raw data in quantity-the real-world shipping example mentioned repeatedly throughout the book. That was the one test project I had in 10 years in which I tested raw data in quantity. One part of the integration effort was to test the integration of the acquired company's car movement data stream with the parent company's car movement data stream.

The team accomplished this testing in a completely manual mode even though millions of messages had to be analyzed and verified. The testers were all SMEs and senior staff members. The complexity of their analysis could not be automated, or even taught to professional testers. Every verification and validation required the experiences of a lifetime and the expertise of the very best.

The effort was an enormous undertaking and cost four times more than the estimates. I believe that budget money was appropriated from every other department at the parent company to pay for it. Nevertheless, it was mission-critical that those data streams maintained 100 percent integrity, and consequently, no price was too high for the test effort that ensured the success of this integration effort.

Data-Dependent Paths

Some paths will be more data-dependent than others. In these cases, the number of tests performed is a function of the number of data sets that will be tested. The same path, or at least parts of the same path, will be exercised repeatedly. The data will control the branches taken and not taken.

If you approach data analysis without considering the independent paths, you will certainly miss some important paths. In my experience, this is how many hard-to-reproduce bugs get into production. Someone tests all the main, easily identified data sets without considering all the possible exception paths. This is why I recommend performing the path analysis and then populating the paths with the data sets that are required to exercise the most important paths, rather than the other way around.

Having said that, I must add that users do some unexpected things with data, and so an examination of paths alone will not suffice to cover all the exceptions that will be exercised by the user.

Some Thoughts about Error Messages

Error messages for data exceptions are an important consideration in a good test effort. In my study of production problems at Prodigy, it became clear that virtually all of the most tenacious, expensive, and longest-lived production problems involved one or more missing or erroneous error messages. These problems had the most profound impact on customer service as well.

Data-dependent error messages need to be accounted for in the test inventory as part of your data analysis. I haven't seen a complete list of error messages for an application since 1995. In today's object-oriented architectures, they tend to be decentralized, so accounting for them usually requires exploration. I generally estimate how many I should find when I do my path analysis. There should be at least one error message for each exception path and at least one data error message for each data entry field. This area of testing may be a minor concern to you or it may be a major issue. Here are a couple of examples of what I mean.

There was a startup company with a B2B Web application that I tested during the dot-com boom. There was only one error message in the entire Web application. The text of the error message was just one word: "Wrong." This developer's error message "placeholder" appeared whenever the application encountered a data error. The testers complained about the message, and they were told that it would be replaced by the appropriate text messages in due course. Of course, it was never fully eradicated from the system, and it would pop up at the most inconvenient times. Fortunately, this company went into the sea with the other lemmings when the dot-coms crashed.

On the other end of the spectrum, I had the pleasure to write some white papers for a Danish firm that developed and marketed the finest enterprise resource planning (ERP) products I have ever seen. Reviewing (testing) their products was the most wonderful breath of fresh air in testing I have had since Prodigy. Their products were marketed throughout Europe and America, and simultaneously supported many languages.

To ensure high-quality, appropriate, and helpful error messages in many languages, they incorporated the creation and maintenance of the error message text for any required language into their development platform. The development platform kept a to-do list for all unfinished items, and developers could not check in their code as complete until the error messages were also marked complete. The company hired linguists to create and maintain all their text messages, but it was the responsibility of the developer to make sure the correct messages were attached to the exception processors in their code.

This system worked wonderfully in all its languages. The helpful text messages contributed to both high customer satisfaction and fewer calls to customer service.

Testing Data Input by Users (the GUI)

Most of the data testing we do these days is user input, and that is what we concentrate on in this book. I have included one example about testing raw data in quantity-the real-world shipping example mentioned repeatedly throughout the book. That was the one test project I had in 10 years in which I tested raw data in quantity. One part of the integration effort was to test the integration of the acquired company's car movement data stream with the parent company's car movement data stream.

The team accomplished this testing in a completely manual mode even though millions of messages had to be analyzed and verified. The testers were all SMEs and senior staff members. The complexity of their analysis could not be automated, or even taught to professional testers. Every verification and validation required the experiences of a lifetime and the expertise of the very best.

The effort was an enormous undertaking and cost four times more than the estimates. I believe that budget money was appropriated from every other department at the parent company to pay for it. Nevertheless, it was mission-critical that those data streams maintained 100 percent integrity, and consequently, no price was too high for the test effort that ensured the success of this integration effort.

Data-Dependent Paths

Some paths will be more data-dependent than others. In these cases, the number of tests performed is a function of the number of data sets that will be tested. The same path, or at least parts of the same path, will be exercised repeatedly. The data will control the branches taken and not taken.

If you approach data analysis without considering the independent paths, you will certainly miss some important paths. In my experience, this is how many hard-to-reproduce bugs get into production. Someone tests all the main, easily identified data sets without considering all the possible exception paths. This is why I recommend performing the path analysis and then populating the paths with the data sets that are required to exercise the most important paths, rather than the other way around.

Having said that, I must add that users do some unexpected things with data, and so an examination of paths alone will not suffice to cover all the exceptions that will be exercised by the user.

Some Thoughts about Error Messages

Error messages for data exceptions are an important consideration in a good test effort. In my study of production problems at Prodigy, it became clear that virtually all of the most tenacious, expensive, and longest-lived production problems involved one or more missing or erroneous error messages. These problems had the most profound impact on customer service as well.

Data-dependent error messages need to be accounted for in the test inventory as part of your data analysis. I haven't seen a complete list of error messages for an application since 1995. In today's object-oriented architectures, they tend to be decentralized, so accounting for them usually requires exploration. I generally estimate how many I should find when I do my path analysis. There should be at least one error message for each exception path and at least one data error message for each data entry field. This area of testing may be a minor concern to you or it may be a major issue. Here are a couple of examples of what I mean.

There was a startup company with a B2B Web application that I tested during the dot-com boom. There was only one error message in the entire Web application. The text of the error message was just one word: "Wrong." This developer's error message "placeholder" appeared whenever the application encountered a data error. The testers complained about the message, and they were told that it would be replaced by the appropriate text messages in due course. Of course, it was never fully eradicated from the system, and it would pop up at the most inconvenient times. Fortunately, this company went into the sea with the other lemmings when the dot-coms crashed.

On the other end of the spectrum, I had the pleasure to write some white papers for a Danish firm that developed and marketed the finest enterprise resource planning (ERP) products I have ever seen. Reviewing (testing) their products was the most wonderful breath of fresh air in testing I have had since Prodigy. Their products were marketed throughout Europe and America, and simultaneously supported many languages.

To ensure high-quality, appropriate, and helpful error messages in many languages, they incorporated the creation and maintenance of the error message text for any required language into their development platform. The development platform kept a to-do list for all unfinished items, and developers could not check in their code as complete until the error messages were also marked complete. The company hired linguists to create and maintain all their text messages, but it was the responsibility of the developer to make sure the correct messages were attached to the exception processors in their code.

This system worked wonderfully in all its languages. The helpful text messages contributed to both high customer satisfaction and fewer calls to customer service.

Field Validation Tests

As the first example, I will use BVA and a data-reducing assumption to determine the minimum number of tests I have to run to make sure that the application is only accepting valid month and year data from the form.

Translating the acceptable values for boundary value analysis, the expiration month data set becomes:

1 month 12

BVA-based data set = {0,1,2,11,12,13} (6 data points)

The values that would normally be selected for BVA are 0, 1, 2, and 11, 12, 13.

Using simple data reduction techniques, we will further reduce this number of data points by the following assumptions.


Assumption 1.

One of the values, 2 or 11, is probably redundant; therefore, only one midpoint, 6, will be tested.

Month data set = {0,1,6,12,13} (5 data points)

This next assumption may be arbitrary, especially in the face of the hacker story that I just related, but it is a typical assumption.


Assumption 2.

Negative values will not be a consideration


Likewise, the valid field data set for the expiration year becomes

2002 year 2011

BVA year data set = {2001,2002,2003,2010,2011,2012}

Again, I will apply a simplifying assumption.


Assumption 3.

One of the values, 2003 or 2010, is probably redundant; therefore, only the midpoint, 2006, will be tested.

BVA year data set = {2001,2002,2006,2011,2012}

These two fields, a valid month and a valid year, are combined to become a data set in the credit authorization process. These are the data values that will be used to build that test set. But before I continue with this example, I need to mention one more data reduction technique that is very commonly used but not often formalized.

Matrix Data Reduction Techniques

We all use data reduction techniques whether we realize it or not. The technique used here simply removes redundant data, or data that is likely to be redundant, from the test data sets. It is important to document data reductions so that others can understand the basis of the reduction. When data is eliminated arbitrarily, the result is usually large holes in the test coverage. Because data reduction techniques are routinely applied to data before test design starts, reducing the number of test data sets by ranking them as we did with the paths may not be necessary.

Data Set Truth Table

All these values need to be valid or we will never get a credit card authorization to pass. But consider it a different way. Let's say we put in a valid date and a valid credit card number, but we pick the wrong type of credit card. All the field values are valid, but the data set should fail. To build the data sets that I need, I must first understand the rules. This table tells me how many true data values I need for each one card to get a credit authorization.


Data Set 1-The set of all Valid Data, all in the data set

Is a valid value for the field

Is a valid member of this Data Set

Minimum Number of Data to test

Minimum Number of Data Sets to test

Cardholder Name

  1. First Name

True

True

1

  1. Last Name

True

True

1

Billing Address

  1. Street Address

True

True

1

  1. City

True

True

1

  1. State

True

True

1

  1. Zip

True

True

1

Credit Card Information

  1. Card Type

True

True

1

  1. Card Number

True

True

1

  1. Expiration Month

True

True

1

  1. Expiration Year

True

True

1

  1. Card Verification Number

True

True

1

OUTCOME:

True

True

10

1

No comments: