Wednesday, April 3, 2019

Testing Criteria


Criteria - is a trait that has three main purposes: it is produced by 
1) an assessment
2) identification and 
3) classification. Consider these three appointments in turn applied to software testing.
Introduction
The term "testing criteria" is interesting because of its use in the field of software testing services often leads to misunderstandings, while in other areas of human activity such problems do not arise.
 For example, when we start a project on testing and the customer asked what the criteria of testing will be used must specify exactly what criteria he has in mind - a criteria for testing is complete, or the criteria of selection tests for regression testing services, or it may be some more test . 
Why is this happening and how to get rid of the uncertainty associated with the use of the term "testing criteria" - these are the two main issues, which is devoted to this essay.
Digression about the terms
Dates always cause much controversy. Want to find definitions of terms that would satisfy everyone. Alas, this rarely happens. People invest in the same words mean different things, which sometimes leads to mutual misunderstanding.
Terms that are used only in some special areas are less prone to this phenomenon of multiplicity of meanings. It is unlikely that the terms "surplus", "giardiasis" or "abyss" will cause difficulties with the ambiguity in the interpretation of experts in relevant subject areas. Layman also look into the dictionary and also receive a completely unambiguous interpretation of these terms.
Problems arise with the terms that are used not only in one special area, and are in different domains have different meanings, or at least different shades of meaning. For example, the word "bay" may mean either a small bay, and a coil of rope or cable. The word "function" in mathematics and programming means is not quite the same thing.
But worst of all is the case with words that are widely used in everyday speech. Clarification of the meaning of domestic terms applicable to a particular area - the most common cause of terminological disputes.
You can see that the meaning of the word is determined not only by word but also the context in which it is used. Terms that are used in different subject areas, require clarification of context, only after this term acquires a precise meaning.
However, there is a simple trick that allows some to cope with the ambiguous interpretation of the term. This technique - refinement of the context by using not a single word, a typical phrase. In this case the characteristic phrase "pulling" zasoboy context, thereby creating an environment conducive to a correct interpretation of the term.
For example, mentioned earlier, the term "function" has some shades of meaning depending on context, but if you say "mathematical function" or "function in C", the context immediately clear.
In this essay we will talk about using the term "criteria", which is owned by just one of the most ambiguous concepts of household. More precisely, we are interested in a more narrow term "criteria for testing, and in particular its use in the field of software testing.
What is a "criteria"?
"Criteria - sign, on the basis of which is assessed, definition, classification of anything, measure. "
Thus, the criteria - it is a sign that has three main purposes: using it is
 1) assessment, 
2) identification and 
3) classification.
We consider in detail these three appointments in turn applied to software testing in the next two sections. But first try to understand why the use of the term "testing criteria" in other domains does not lead to uncertainty.
In the following explanation I am getting ahead of myself a little, maybe it will become clearer after reading the next section. Nevertheless, I want to immediately deal with all the other areas to the rest of this essay to focus on software testing. So I apologize to readers that this explanation seems convincing enough, I hope that after reading the next section, all fall into place.
So, why in other subject areas covered by this term without any problems? To answer this question is enough to pay attention to the contexts in which this phrase is used, and that it meant by testing.
There are two main contexts in which uses the term "testing criteria".
First, the most widely used - a comparison of something to set the parameters and identifying the best among them. By testing in this context refers to the comparison itself, so that is probably more correct in this situation would opt out of using the term "testing criteria", replacing it with a more explicit "comparison criteria. Nevertheless, the context of the comparison is easily identified and rigidly defines what action to take - choose the best of several, so that the term "testing criteria" in this context clearly concretized and does not cause any confusion.
The second, more specific context, is associated with an area of study or knowledge control. In this case, under the testing means testing the level of knowledge. In this case also clear from the context, a decision must be taken - whether achieved a certain level of knowledge, so that here the term is instantiated completely unique.
Other fairly common causes unknown to me.
Area of software testing is different in that there contexts in which use of the term "testing criteria" can be considered meaningful, much more. This is what is causing ambiguity when the term is used without a clear indication of the context.
Also, remember that the test has three purposes - with an assessment of him, the definition and classification. In the field of software testing found all three of these aspects, as in other areas, as can be seen from the above - only the first two.
The next two sections, all three criteria for appointment are discussed in detail in relation to software testing, and proposes a set of phrases using the term "criteria", which is more specific than the general term "testing criteria" and the more tightly bound, each to their context, so that their use does not cause confusion.
Evaluation and Assessment
These two aspects are very closely related to each other. More precisely, the first is usually subordinated to the latter. So let’s start with the second, as more significant.
Using the criteria, as can be seen from the etymology of the term (Greek kriterion - means for the decision), is inextricably linked with the adoption of certain decisions. criteria helps to determine what action should be taken in any given situation.
To make an informed decision requires information. Where did it take? The main source of information are different metrics, ie, the quantitative characteristics of the phenomena or objects.
Deciding to use metrics as follows: we compute the current value of metrics and compared with some critical value. If the critical value is reached - one decision, and if not achieved - another solution.
Calculating the current value of the metric - this assessment. The choice of critical values and compare with it - this definition. This is the very first two aspects of the criteria.
Of course, this simplified model. In real life, for a decision can be used more than one metric, and a set of interrelated or independent metrics. And the choice can be multi variant. But the general idea remains the same.
However, the move closer to the subject and see what decisions must be taken in testing the software in different situations that determine the different contexts of use of the term "criteria". And for every situation, it will offer a specific context for this phrase, which instead use the phrase "testing criteria" can significantly reduce the confusion.
Criteria testing began
If testing can be seen as some activity in the software development process, the natural way would create two decision points - when this work started and when it is complete.
However, testing is not an indivisible act more correct to speak of him as a sub process of the process of software development. This sub process consists of a series of activities, each of which is related dependencies with other activities, including those not related to testing. For more details, the reader may refer to any model of software development process, such as Rational Unified Process, now important for us to only one consequence of this concept - is meaningless to talk about the criteria for start or completion of testing, because it begins and ends with beginning and end of the whole project.
More correct to speak about the beginning and end of individual activities, such as testing the design documentation, development of tests, performance testing services, and related criteria.
Recall again that the test - it is not just a metric, and the pair: the metric plus some critical value.
Thus, as a criteria for the start of test development can be, for example, the following condition: "describes all the use cases." Here, the metric - the ratio of the described use cases to the number of all reported, the critical value - 100%.
However, this condition can not be a criteria for the start of running tests, it will have to wait for the implementation of some other conditions, such as this: "half implemented functionality of the application, developers have covered unit-tests, 80% of the written code, and these tests do not detect errors." Here we see a composite metric that is constructed from three affiliates of simple metrics. First - the percentage of implemented "features", or function points, or use cases, or anything else, what can be measured by the functionality, the critical value - 50%. Second - covering the written code unit-test, the critical value - 80%. The third - the ratio of the number of successful unit-tests to their total number, the critical value - 100%.
In the above example, the second and third parts of a composite metric clarify the first part, which, without this clarification, it becomes too uncertain. The reason for this is that the term "realized" also refers to the number of "household" concept, and therefore admits the ambiguity of interpretation. Developers can assume that the "realized" - means the code is written, and testers - the code is fully debugged and stable.
I do not intend in this essay to go into a discussion of how well or poorly at a particular criteria, because it strongly depends on the particular situation. Just need to understand that the choice of suitable criteria is not straightforward - the metrics are not uniquely determined by a decision to take, and critical values can also vary greatly. Do not try to find a universal criteria for all occasions, remember: that the Russian is good - then the German’s death.
However, it is so difficult to begin testing how to finish it.
Criteria for termination and testing
If you start to do something only one way, you can finish at least two - successful and unsuccessful.
Criteria associated with the completion of testing, usually related to follow-up to test, since it is the key - after it enters the product or service, or returned for revision.
In fact, there is a third possible outcome - continued testing. And since you need to determine one of three possible outcomes, requires two different criteria. They are commonly referred to as the criteria of successful and unsuccessful completion of testing.
However, these terms are often confusing because some people consider successful testing of the safe passage of the product of all the tests, while others say the successful test, if you have found a lot of defects and hack to death "product. Since the concept of "successful" test is too vague and subjective, and also has an emotional, I prefer the more neutral terminology - a criteria for the completion of testing (the product is good enough) and the cessation of testing (the product needs some work).
These two criteria are generally independent, but they are applied simultaneously. This means that we need to verify fulfillment of each and make a decision on the grounds that some of them executed. And what if both criteria together? In this case it means that the criteria chosen unsuccessfully. Product can not be both so bad that require special and so good to let it into operation. But if you for some reason all the same criteria are chosen so that this situation is possible, set the criteria for priorities and make decisions on the basis of a high-priority criteria.
So, what criteria can be used to determine whether to stop testing, continue or terminate it? No wonder that the metrics used in the criteria for completion and termination may be different. And there may be the same.
Let’s start with the cessation of testing, as a simpler and less responsible. Strictly speaking, it is precisely because the more simple, because less than responsible. Sending product back for revision to anything testers did not oblige, much worse if they give "good" substandard products. Perhaps for this reason, the testers are so eager to "hack to death" product - to delay the adoption of responsible decisions. However, now it’s not about the motivations of testers, and what criteria they use to determine whether the product back for revision or for that was not warranted.
Unlike the criteria for the initiation of activities for testing, which usually somehow related to the amount of work performed (see examples in the previous section), the criteria for termination and testing, usually based on some metrics of product quality.
Classical and most common quality metric is the ratio of the number of completed requirements for their total number. Using this metric as a basis, can be formulated, for example, such conditions:
·         testing is terminated if the infringements of more than 20% of the requirements for application functionality (metric - the ratio of the number of outstanding claims to their total number, the critical value - 20%)
·         testing is completed, if verified that all the requirements (metric - the ratio of the number of completed requirements for their total number, the critical value - 100%)
·         testing is continuing, if not satisfied none of the previous two conditions.
Of course, this metric is not ideal, since it is nonlinear - not all requirements are equally important. You can improve this metric, for example, introducing to the requirements of the "weight" or "criticality". But let it remain outside our field of view, I repeat - I’m not going to discuss here the advantages and disadvantages of specific criteria.
However, despite this disclaimer, I want to point out three bad metric with which I, unfortunately, is often encountered in real life: the number of defects found, the number of executed tests and code coverage of the program. If someone does not agree with this point of view - welcome to the forum for discussion.
Once again on the metrics and assessments
Until now we have two parts - "Evaluation and Assessment", focuses on the second, ie the definition of when it’s doing some conditions affecting the adoption of a decision, and the estimate holds a subordinate role. However, the assessment itself is a very useful practical function. Knowing the current value of the metric allows us to see how far it is separated from the critical value, allows you to monitor changes in current value, so you can anticipate and plan for the term decision.
Mini-criteria
In addition to the above-described "big" decisions have to take a lot of small daily decisions in which also involved some or other criteria.
And chief among these mini-tests is a criteria for passing a single test. Using this criteria, we decide whether we can assume that the observed behavior is correct or not. Formulation of criteria for successful completion of tests - this is one of the main problems of testing, the so-called "oracle problem".
Honestly, I rarely met a confusing use of the term criteria as applied to the criteria of passing the tests. Therefore, to dwell on this subject will not, otherwise it will be pulled over too long a chain, which would lead us away from the topic of our essay, and already it’s time to finally turn to the third destination criteria.
Classification
To some extent, this third aspect is reduced to the previous two, since the main task of classification is the decision on whether two instances of members of the same class or different classes. Theoretically speaking, the task of classification is to construct a system of classes. However, in practice it is usually expressed in the selection of specific representatives of these classes, so the criteria used for classification, as a rule, called the selection criteria.
In software testing, this manifests itself in the formation of test kits. Thus it is necessary to decide whether to include a given test in a test set or it is not necessary. Because you want on the one hand to minimize the test set to reduce the run time tests, on the other hand ensure sufficient representativeness of the test suite - a problem of selection is extremely important.
Using a selection criteria for the formation of test kits, in my opinion, most clearly and vividly explained in the following quote from an article by Olga Bezzubovoy "The museum as an object of philosophical-anthropological study»:
"The classic criteria for selection of museum objects [...] in many respects different from the principles of compiling an archive and library. If the latter claim to be the most complete (more than a vast collection, so it is more valuable), the museum collections are divided between two poles - a marginal thing and the thing perfect, exemplary. That is, on the one hand, we have items that are not having an independent value, give an idea of the whole class, on the other side - all unique, out of the ordinary. "
Yes, a set of tests - this is not a library or archive it - the museum.
When forming a set of tests makes sense, to include the tests are exactly the two types: tests, "which, while not possessing intrinsic value, give an idea of the whole class, and tests of" marginal, unique, out of the ordinary. "
On this principle, in particular, is based techniques partition the input data into equivalence classes: elected representatives from each equivalence class (giving an idea of the whole class) and the boundary or close to the boundary values (the word derives from a marginal margin, which means edge or border area something).
While the partition into equivalence classes is commonly referred to in the context of functional testing services, application of this principle is not limited to this type of Software QA. For example, when stress testing separately verified the behavior of the system under "normal" and "peak" load - these are two great classes, which in testing are regarded as different in their properties.
On the other hand, some kinds of testing have focused on either only the typical, or only marginal objects or phenomena. For example, if usability testing is constructed portrait of the typical user, and for testing the stability of persistently searched for specific data, which can lead to a crash.
Here we will, because our goal was just to show how the term "criteria" used in the context of the problem of classification tests, ie the construction of test suites, and not explore different ways of classification.
Conclusion
So, we saw that the use of the term "criteria" in the field of software testing is very varied. Different types of criteria used for making large and small decisions - from decisions about passing a single test to a decision on successful testing in general. And all of them to a greater or lesser degree may be considered "testing criteria".
Based on the above considerations, I strongly encourage everyone to stop using the term "testing criteria" in order to reduce the already considerable confusion with the terminology in the field of software testing. Instead, I propose to use more specific phrases for the criteria used in different situations.


No comments:

Post a Comment