Criteria - is a trait that has three main purposes: it is produced by
1) an
assessment
2) identification and
3) classification. Consider these three
appointments in turn applied to software testing.
Introduction
The term "testing
criteria" is interesting because of its use in the field of
software testing services often leads to misunderstandings, while in other areas of
human activity such problems do not arise.
For example, when we start a project
on testing and the customer asked what the criteria of testing will be used
must specify exactly what criteria he has in mind - a criteria for testing is
complete, or the criteria of selection tests for regression testing services, or it may
be some more test .
Why is this happening and how to get rid of the uncertainty
associated with the use of the term "testing criteria" - these are
the two main issues, which is devoted to this essay.
Digression about the terms
Dates always cause
much controversy. Want to find definitions of terms that would satisfy
everyone. Alas, this rarely happens. People invest in the same words mean
different things, which sometimes leads to mutual misunderstanding.
Terms that are used
only in some special areas are less prone to this phenomenon of multiplicity of
meanings. It is unlikely that the terms "surplus",
"giardiasis" or "abyss" will cause difficulties with the
ambiguity in the interpretation of experts in relevant subject areas. Layman
also look into the dictionary and also receive a completely unambiguous
interpretation of these terms.
Problems arise with
the terms that are used not only in one special area, and are in different
domains have different meanings, or at least different shades of meaning. For
example, the word "bay" may mean either a small bay, and a coil of
rope or cable. The word "function" in mathematics and programming
means is not quite the same thing.
But worst of all is
the case with words that are widely used in everyday speech. Clarification of
the meaning of domestic terms applicable to a particular area - the most common
cause of terminological disputes.
You can see that the
meaning of the word is determined not only by word but also the context in which
it is used. Terms that are used in different subject areas, require
clarification of context, only after this term acquires a precise meaning.
However, there is a
simple trick that allows some to cope with the ambiguous interpretation of the
term. This technique - refinement of the context by using not a single word, a
typical phrase. In this case the characteristic phrase "pulling" zasoboy
context, thereby creating an environment conducive to a correct interpretation
of the term.
For example,
mentioned earlier, the term "function" has some shades of meaning
depending on context, but if you say "mathematical function" or
"function in C", the context immediately clear.
In this essay we will
talk about using the term "criteria", which is owned by just one of
the most ambiguous concepts of household. More precisely, we are interested in
a more narrow term "criteria for testing, and in particular its use in the
field of software testing.
What is a "criteria"?
"Criteria - sign, on the basis of which is assessed,
definition, classification of anything, measure. "
Thus, the criteria - it is a sign that has three main purposes: using it is
Thus, the criteria - it is a sign that has three main purposes: using it is
1)
assessment,
2) identification and
3) classification.
We consider in detail
these three appointments in turn applied to software testing in the next two
sections. But first try to understand why the use of the term "testing
criteria" in other domains does not lead to uncertainty.
In the following
explanation I am getting ahead of myself a little, maybe it will become clearer
after reading the next section. Nevertheless, I want to immediately deal with
all the other areas to the rest of this essay to focus on software testing. So
I apologize to readers that this explanation seems convincing enough, I hope
that after reading the next section, all fall into place.
So, why in other
subject areas covered by this term without any problems? To answer this
question is enough to pay attention to the contexts in which this phrase is
used, and that it meant by testing.
There are two main
contexts in which uses the term "testing criteria".
First, the most
widely used - a comparison of something to set the parameters and identifying
the best among them. By testing in this context refers to the comparison
itself, so that is probably more correct in this situation would opt out of
using the term "testing criteria", replacing it with a more explicit
"comparison criteria. Nevertheless, the context of the comparison is
easily identified and rigidly defines what action to take - choose the best of
several, so that the term "testing criteria" in this context clearly
concretized and does not cause any confusion.
The second, more
specific context, is associated with an area of study or knowledge control. In
this case, under the testing means testing the level of knowledge. In this case
also clear from the context, a decision must be taken - whether achieved a
certain level of knowledge, so that here the term is instantiated completely
unique.
Other fairly common
causes unknown to me.
Area of software
testing is different in that there contexts in which use of the term
"testing criteria" can be considered meaningful, much more. This is
what is causing ambiguity when the term is used without a clear indication of
the context.
Also, remember that
the test has three purposes - with an assessment of him, the definition and
classification. In the field of software testing found all three of these
aspects, as in other areas, as can be seen from the above - only the first two.
The next two
sections, all three criteria for appointment are discussed in detail in
relation to software testing, and proposes a set of phrases using the term
"criteria", which is more specific than the general term
"testing criteria" and the more tightly bound, each to their context,
so that their use does not cause confusion.
Evaluation and Assessment
These two aspects are
very closely related to each other. More precisely, the first is usually
subordinated to the latter. So let’s start with the second, as more
significant.
Using the criteria,
as can be seen from the etymology of the term (Greek kriterion - means for the
decision), is inextricably linked with the adoption of certain decisions.
criteria helps to determine what action should be taken in any given situation.
To make an informed
decision requires information. Where did it take? The main source of
information are different metrics, ie, the quantitative characteristics of the
phenomena or objects.
Deciding to use
metrics as follows: we compute the current value of metrics and compared with
some critical value. If the critical value is reached - one decision, and if
not achieved - another solution.
Calculating the
current value of the metric - this assessment. The choice of critical values
and compare with it - this definition. This is the very first two aspects of
the criteria.
Of course, this
simplified model. In real life, for a decision can be used more than one
metric, and a set of interrelated or independent metrics. And the choice can be multi variant. But the general idea remains the same.
However, the move
closer to the subject and see what decisions must be taken in testing the
software in different situations that determine the different contexts of use
of the term "criteria". And for every situation, it will offer a
specific context for this phrase, which instead use the phrase "testing
criteria" can significantly reduce the confusion.
Criteria testing began
If testing can be
seen as some activity in the software development process, the natural way
would create two decision points - when this work started and when it is
complete.
However, testing is
not an indivisible act more correct to speak of him as a sub process of the
process of software development. This sub process consists of a series of
activities, each of which is related dependencies with other activities,
including those not related to testing. For more details, the reader may refer
to any model of software development process, such as Rational Unified Process,
now important for us to only one consequence of this concept - is meaningless
to talk about the criteria for start or completion of testing, because it
begins and ends with beginning and end of the whole project.
More correct to speak
about the beginning and end of individual activities, such as testing the
design documentation, development of tests, performance testing services, and related
criteria.
Recall again that the
test - it is not just a metric, and the pair: the metric plus some critical
value.
Thus, as a criteria
for the start of test development can be, for example, the following condition:
"describes all the use cases." Here, the metric - the ratio of the
described use cases to the number of all reported, the critical value - 100%.
However, this
condition can not be a criteria for the start of running tests, it will have to
wait for the implementation of some other conditions, such as this: "half
implemented functionality of the application, developers have covered
unit-tests, 80% of the written code, and these tests do not detect
errors." Here we see a composite metric that is constructed from three
affiliates of simple metrics. First - the percentage of implemented
"features", or function points, or use cases, or anything else, what
can be measured by the functionality, the critical value - 50%. Second -
covering the written code unit-test, the critical value - 80%. The third - the
ratio of the number of successful unit-tests to their total number, the
critical value - 100%.
In the above example,
the second and third parts of a composite metric clarify the first part, which,
without this clarification, it becomes too uncertain. The reason for this is
that the term "realized" also refers to the number of
"household" concept, and therefore admits the ambiguity of
interpretation. Developers can assume that the "realized" - means the
code is written, and testers - the code is fully debugged and stable.
I do not intend in
this essay to go into a discussion of how well or poorly at a particular
criteria, because it strongly depends on the particular situation. Just need to
understand that the choice of suitable criteria is not straightforward - the
metrics are not uniquely determined by a decision to take, and critical values
can also vary greatly. Do not try to find a universal criteria for all
occasions, remember: that the Russian is good - then the German’s death.
However, it is so
difficult to begin testing how to finish it.
Criteria for termination and testing
If you start to do
something only one way, you can finish at least two - successful and
unsuccessful.
Criteria associated
with the completion of testing, usually related to follow-up to test, since it
is the key - after it enters the product or service, or returned for revision.
In fact, there is a
third possible outcome - continued testing. And since you need to determine one
of three possible outcomes, requires two different criteria. They are commonly
referred to as the criteria of successful and unsuccessful completion of
testing.
However, these terms
are often confusing because some people consider successful testing of the safe
passage of the product of all the tests, while others say the successful test,
if you have found a lot of defects and hack to death "product. Since the
concept of "successful" test is too vague and subjective, and also
has an emotional, I prefer the more neutral terminology - a criteria for the
completion of testing (the product is good enough) and the cessation of testing
(the product needs some work).
These two criteria
are generally independent, but they are applied simultaneously. This means that
we need to verify fulfillment of each and make a decision on the grounds that
some of them executed. And what if both criteria together? In this case it
means that the criteria chosen unsuccessfully. Product can not be both so bad
that require special and so good to let it into operation. But if you for some
reason all the same criteria are chosen so that this situation is possible, set
the criteria for priorities and make decisions on the basis of a high-priority
criteria.
So, what criteria can
be used to determine whether to stop testing, continue or terminate it? No
wonder that the metrics used in the criteria for completion and termination may
be different. And there may be the same.
Let’s start with the
cessation of testing, as a simpler and less responsible. Strictly speaking, it
is precisely because the more simple, because less than responsible. Sending
product back for revision to anything testers did not oblige, much worse if
they give "good" substandard products. Perhaps for this reason, the
testers are so eager to "hack to death" product - to delay the
adoption of responsible decisions. However, now it’s not about the motivations
of testers, and what criteria they use to determine whether the product back
for revision or for that was not warranted.
Unlike the criteria
for the initiation of activities for testing, which usually somehow related to
the amount of work performed (see examples in the previous section), the
criteria for termination and testing, usually based on some metrics of product
quality.
Classical and most
common quality metric is the ratio of the number of completed requirements for
their total number. Using this metric as a basis, can be formulated, for
example, such conditions:
·
testing is terminated if the infringements of more than 20% of the
requirements for application functionality (metric - the ratio of the number of
outstanding claims to their total number, the critical value - 20%)
·
testing is completed, if verified that all the requirements (metric -
the ratio of the number of completed requirements for their total number, the
critical value - 100%)
·
testing is continuing, if not satisfied none of the previous two
conditions.
Of course, this
metric is not ideal, since it is nonlinear - not all requirements are equally
important. You can improve this metric, for example, introducing to the
requirements of the "weight" or "criticality". But let it
remain outside our field of view, I repeat - I’m not going to discuss here the
advantages and disadvantages of specific criteria.
However, despite this
disclaimer, I want to point out three bad metric with which I, unfortunately,
is often encountered in real life: the number of defects found, the number of
executed tests and code coverage of the program. If someone does not agree with
this point of view - welcome to the forum for discussion.
Once again on the metrics and assessments
Until now we have two
parts - "Evaluation and Assessment", focuses on the second, ie the
definition of when it’s doing some conditions affecting the adoption of a
decision, and the estimate holds a subordinate role. However, the assessment
itself is a very useful practical function. Knowing the current value of the
metric allows us to see how far it is separated from the critical value, allows
you to monitor changes in current value, so you can anticipate and plan for the
term decision.
Mini-criteria
In addition to the
above-described "big" decisions have to take a lot of small daily
decisions in which also involved some or other criteria.
And chief among these
mini-tests is a criteria for passing a single test. Using this criteria, we
decide whether we can assume that the observed behavior is correct or not.
Formulation of criteria for successful completion of tests - this is one of the
main problems of testing, the so-called "oracle problem".
Honestly, I rarely
met a confusing use of the term criteria as applied to the criteria of passing
the tests. Therefore, to dwell on this subject will not, otherwise it will be
pulled over too long a chain, which would lead us away from the topic of our
essay, and already it’s time to finally turn to the third destination criteria.
Classification
To some extent, this
third aspect is reduced to the previous two, since the main task of
classification is the decision on whether two instances of members of the same
class or different classes. Theoretically speaking, the task of classification
is to construct a system of classes. However, in practice it is usually
expressed in the selection of specific representatives of these classes, so the
criteria used for classification, as a rule, called the selection criteria.
In software testing,
this manifests itself in the formation of test kits. Thus it is necessary to
decide whether to include a given test in a test set or it is not necessary.
Because you want on the one hand to minimize the test set to reduce the run
time tests, on the other hand ensure sufficient representativeness of the test
suite - a problem of selection is extremely important.
Using a selection
criteria for the formation of test kits, in my opinion, most clearly and
vividly explained in the following quote from an article by Olga Bezzubovoy
"The museum as an object of philosophical-anthropological study»:
"The classic criteria for selection of museum objects [...] in many
respects different from the principles of compiling an archive and library. If
the latter claim to be the most complete (more than a vast collection, so it is
more valuable), the museum collections are divided between two poles - a
marginal thing and the thing perfect, exemplary. That is, on the one hand, we
have items that are not having an independent value, give an idea of the whole
class, on the other side - all unique, out of the ordinary. "
Yes, a set of tests -
this is not a library or archive it - the museum.
When forming a set of
tests makes sense, to include the tests are exactly the two types: tests,
"which, while not possessing intrinsic value, give an idea of the whole
class, and tests of" marginal, unique, out of the ordinary. "
On this principle, in
particular, is based techniques partition the input data into equivalence classes:
elected representatives from each equivalence class (giving an idea of the
whole class) and the boundary or close to the boundary values (the word derives
from a marginal margin, which means edge or border area something).
While the partition
into equivalence classes is commonly referred to in the context of functional testing services, application of this principle is not limited to this type of Software QA.
For example, when stress testing separately verified the behavior of the system
under "normal" and "peak" load - these are two great
classes, which in testing are regarded as different in their properties.
On the other hand,
some kinds of testing have focused on either only the typical, or only marginal
objects or phenomena. For example, if usability testing is constructed portrait
of the typical user, and for testing the stability of persistently searched for
specific data, which can lead to a crash.
Here we will, because
our goal was just to show how the term "criteria" used in the context
of the problem of classification tests, ie the construction of test suites, and
not explore different ways of classification.
Conclusion
So, we saw that the
use of the term "criteria" in the field of software testing is
very varied. Different types of criteria used for making large and small
decisions - from decisions about passing a single test to a decision on
successful testing in general. And all of them to a greater or lesser degree
may be considered "testing criteria".
Based on the above
considerations, I strongly encourage everyone to stop using the term "testing
criteria" in order to reduce the already considerable confusion with
the terminology in the field of software testing. Instead, I propose to use
more specific phrases for the criteria used in different situations.
No comments:
Post a Comment