VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38
Generating Test Data for Software Structural Testing using Particle Swarm Optimization
Dinh Ngoc Thi*
VNU University of Engineering and Technology, 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
Abstract
Search-based test data generation is a very popular domain in the field of automatic test data generation. However, existing search-based test data generators suffer fromsome problems. By combining static program analysis and search-based testing, our proposed approach overcomesone of these problems. Considering the automatic ability and the path coverage as the test adequacycriterion, this paper proposes using Particle Swarm Optimization, an alternative search technique, for automating the generation of test data for evolutionary structural testing. Experimental results demonstrate that our test data generator can generate suitable test data with higher path coverage than the previous one.
Received 26 Jun 2017; Revised 28 Nov 2017; Accepted 20 Dec 2017
Keywords:Automatic test data generation, search-based software testing, Particle Swarm Optimization.
1. Introduction*
Software is amandatory part of today's life, and has become more and more important in current information society. However, its failure may lead to significanteconomic loss or threat to life safety. As a consequence, software qualityhas become a top concern today. Among the methods of software quality assurance, software testing has been proven as one of the effective approachesto ensure and improve software quality over the past threedecades. However, as most of the software testing is being done manually, the workforce and cost required are accordingly high [1]. In general, about 50 percent of workforce and cost in the software development process is spent on software testing [2]. Considering those reasons, automated software testing has been evaluated
_______
* E-mail.: dinhngocthi@gmail.com https://doi.org/10.25073/2588-1086/vnucsce.165
28
as an efficient and necessary method in order to reduce those efforts and costs.
Automated structural test data generation is becoming the research topic attracting much interest in automated software testingbecause it enhances the efficiency while reducing considerably costs of software testing. In our paper, we will focus on path coverage test data generation, considering that almost all structural test data generation problems can be transformed to the path coverage test datageneration one. Moreover, Kernighan and Plauger [3] also pointed out that path coverage test data generation can find out more than 65 percent of bugs in the given program under test (PUT).
Although path coverage test data generation is the major unsolved problem [20], various approaches have been proposed by researchers. These approaches can be classified into two types: constraint-based test data generation (CBTDG) or search-based test data generation (SBTDG).
D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38 29
Symbolic execution (SE) is the state-of-the-art of CBTDG approaches [21]. Even though there have been significant achievements, SE still faces difficulties in handling infinite loops, array, procedure calls and pointer references in each PUT [22].
There are also random testing, local search [10], and evolutionary methods [23, 24, 25] in SBTDG approaches. As the value of input variables is assigned when a program executes, problems encountered in CBTDG approaches can be avoided in SBTDG.
Being an automated searching method in a predefined space, genetic algorithm (GA) was applied to test data generation since 1992 [26]. Micheal et al [22], Levin and Yehudai [25], Joachim et al [27] indicated that GA outperforms other SBTDG methods e.g. local search or random testing.However eventhough they can generate test data with appropriate fault-prone ability [4, 5], they fail to produce them quickly due to their slowly evolutionary speed. Recently, as a swarm intelligence technique, Particle Swarm Optimization (PSO) [6, 7, 8] has become a hot research topic in the area of intelligent computing. Its significant feature is its simplicity and fast convergence speed.
Even so, there are still certain limitations in current research on PSO usage in test data generation. For example, consider one PUT which was used in Mao’s paper [9] as below:
int getDayNum(int year, int month) { int maxDay=0;
if(month≥1 && month≤12){ //bch1: branch 1
if(month=2){ //bch2: branch 2 if(year%400=0||
(year%4=0&&year%100=0)) //bch3: branch 3 maxDay=29;
else //bch4: branch 4 maxDay=28;
}
else if(month=4||month=6|| month=9||month=11)
//bch5: branch 5
maxDay=30;
else //bch6: branch 6 maxDay=31;
}
else //bch7: branch 7 maxDay=-1;
return maxDay; }
Regarding this PUT, Mao [9] used PSO to generate test data through building the one and only fitness function which was the combination of Korel formula [10] and the branch weights. This proposal has two weaknesses: the branch weight function is entirely performed manually and some PUTs are not able to generate test data to cover all test paths. To overcome these weaknesses, we still use PSO to generate test data for the given PUT. However, unlike Mao, our approach is to assign one fitness function for each test path. Then we will use simultaneous multithreading of PSO to simultaneously find the solution corresponding to this fitness function, which is also the one able to generate test data for this test path.
The rest of this paper is organized as follows: Section 2 gives some theoretical backgroundon fitness function and particle swarm optimization algorithm. Section 3 summarizes some related works, and Section 4 presents the proposed approach in detail. Section 5 shows the experimental results and discussions. Section 6 concludes the paper.
2. Background
This section describes the theoretical background being used in our proposed approach.
2.1. Fitness function
When using PSO, a test path coverage test data generation is transformed into an optimization problem. To cover a test path during execution, we must find appropriate values for the input variables which satisfy related branch predicates. The usual way is to
30 D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38
use Korel’s branch distance function [10]. As a result, generating test data for a desired branch is transformed into searching input values which optimizes the return value of its Korel function. Table 1 gives some common formulas which are used in branch distance functions. To generate test data for a desired path P, we define a fitness function F(P) as the total values of all related branch distance functions. For these reasons, generating path coverage test data can be converted into searching input values which can minimize the return value of function F(P).
Table 1. Korel’s branch functions for severalkinds of branch predicates
Relational Branch distance function f(bchi) predicate
Boolean if true then 0 else k
¬a negation is propagated over a
a = b if abs(a – b)= 0 then 0 else abs(a − b)+ k
a ≠ b if abs(a − b)≠0 then 0 else k a**b if b − a >0 then 0 else abs(b − a)+ k
a ≥ b if b − a ≥ 0 then 0 else abs(b − a)+ k
a and b f (a)+ f(b)
a or b min(f(a), f(b))
Similar to Mao [9], we also set up the punishment factor k = 0.1. Basing on this formula, we will develop a function calculating values at branch predication, which is will be explained in the next part.
2.2. Particle Swarm Optimization
Particle Swarm Optimization (PSO) was first introduced in 1995 by Kennedy and Eberhart [11], and is now widely applied in optimization problems. Compared to other optimal search algorithms such as GA or SA, PSO has the strength of faster convergent speed and easier coding. PSO is initialized with a group of random particles (initial solutions) and
then it searches for optima by updating generations. In every iteration, each particle is updated by the following two "best" values. The first one is the best solution (fitness) achieved so far (the fitness value is also stored). This value is called pbest. Another "best" value tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the population. This best value is a global best and called gbest.
After finding the two best values, the particle updates its velocity and positions with the following equation (1) and (2).
[] = [] + 1 × () × ( [] −
[]) + 2 × () × ( [] −
[])(1)
[] = [] + [](2)
v[] is the particle velocity, persent[] is the current particle (currentsolution). pbest[] and gbest[] are defined as stated before. rand() is a random number between (0,1). c1, c2 are learning factors, usually c1 = c2 = 2.
The PSO algorithm is described by pseudo code as shown below:
Algorithm 1: Particle Swarm Optimization (PSO) Input: F: Fitness function
Output: gBest: The best solution 1: for each particle
2: initialize particle 3: end for
4: do
5: for each particle
6: calculate fitness value
7: if the fitness value is better than the best fitness value (pBest) in history then
8: set current value as the new pBest 9: end if
10: end for
11: choose the particle with the best fitness value of all the particles as the gBest
12: for each particle
13: calculate particle velocity according equation (1)
14: update particle position according equation (2)
15: end for
16: while maximum iterations or minimum criteria is not attained
D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38 31
Particles' velocities on each dimension are clamped to a maximum velocity Vmax, which is aninput parameter specified by the user.
3. Related work
From the 1990s, genetic algorithm (GA) has been adopted to generate test data. Jones et. al. [13] presented a GA-based branch coverage test data generator. Their fitness function made use of weighted Hamming distance tobranch predicate values. They used unrolled control flow graph of a test program such that it is acyclic. Six small programs were used to test the approach.In recent years, Harman and McMinn [14] performed empirical study on GA-based test data generation for large-scale programs, and validated its effectiveness over other meta-heuristic search algorithms.
Although GA is a classical search algorithm, its convergence speed is not very significant. PSO algorithm, which simulates to birds flocking around food sources, was invented by Kennedy and Eberhart [11] in 1995, and was originally just an algorithm used for optimization problems. However with the advantages of faster convergence speed and easier constructionthan other optimization algorithms, it was promptly adopted as a meta-heuristic search algorithm in the automatic test data generation problem.
Automatic test data generation literature using PSO started with Windisch et al. [6] in 2007. They improved the PSO intocomprehensive learning particle swarm optimization (CL-PSO) to generate structural test data, but some experiments proved that the convergence speed of CL-PSO was perhaps worse than the basic PSO.
Jia et al. [8] created an automatic test data generating tool named particle swarm optimization data generation tool (PSODGT). The PSODGT is characterized by two features. First, the PSODGT adopts the condition-decision coverage as the criterion of software testing, aiming to build an efficient test data set that covers all conditions. Second, the
PSODGT uses a particle swarm optimization (PSO) approach to generate test data set. In addition, a new position initialization technique is developed for PSO. Instead of initializing the test data randomly, the proposed technique uses the previously-found test data which can reach the target condition as the initial positions so that the search speed of PSODGT can be further accelerated. The PSODGT is tested on four practical programs.
Khushboo et al. [15] described the application of the discrete quantum particleswarm optimization (QPSO) to the problem of automated test data generation.Thediscrete quantum particle swarm optimization algorithm is proposed on the basis of the conceptof quantum computing. They had studied the role of the critical QPSO parameters on test data generation performance and based on observationsan adaptive version (AQPSO) had been designed. Its performance comparedwith QPSO. They used the branch coverage as their test adequacy criteria.
Tiwari et al. [16] had applied a variant of PSO in the creation of new test data formodified code in regression testing. The experimental resultsdemonstrated that this method could cover more code in lessnumber of iterations than the original PSO algorithm.
Zhu et al. [17] put forward an improved algorithm (APSO) and applied it to automatictest data generation, in which inertia weight was adjusted accordingto the particle fitness. The results showed that APSO had betterperformance than basic PSO.
Dahiya et al. [18] proposed a PSO-basedhybrid testing technique and solved many of the structural testingproblems such as dynamic variables, input dependent array index,abstract function calls, infeasible paths and loop handling.
Singla et al. [19] presented a technique on the basis of a combination ofgenetic algorithm and particle swarm algorithm. It is used togenerate automatic test data for data flow coverage by usingdominance concept between two nodes, which is compared toboth GA and
32 D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38
PSO for generation of automatic test cases todemonstrate its superiority.
Mao [9] and Zhang et al. [7] had the same approach, in which they did not execute any PSO improvement but only built a fitness function by combining the branch distance functions for branch predicates and the branch weights of a PUT, then applied PSO to find the solution for this fitness function. The experiment result with 1 benchmark having 8 programs under test proved that PSO algorithm was more effective than GA in generating test data. However, there remained a weakness that the calculation of branch weight for a PUT was still
K
entirely manual work, which reduced the automatic nature of the proposal. In this paper, our proposal can overcome this limitation while being able to assure the efficiency of a PSO-based automatic test data generation method.
4. Proposed approach
Our proposed approach can be divided into two separate parts: performing static analysis and applying simultaneous multithreading of PSO to generate test data. This approach is presented in the Figure 1 below.
Figure 1. The basic steps for PSO-based test data generation.
4.1. Perform statistical analysis to find out all test paths
At first, we perform the statistical analysis to find all test paths of the given PUT. We call static analysis because of not having to execute the program, we can still generate control flowgraph (CFG) from the given program, and then traverse this CFG to find out all test paths.It can be done through the following two small steps:
1) Control flow graph generation: Test data generated from source code directly is morecomplicated and difficult than from control flow graph (CFG). CFG is a directed graph visualizing logic structures of program [12] and is defined as follow:
Definition1(CFG).Given a program, a corresponding CFG is defined as a pair G =(V, E), where V ={v0, v1,…vn} is a set of vertices representing statements, E ={(vi, vj)|vi, vj∈ V}⊂ V× V is a set of edges. Each edge (vi, vj) implies the statement corresponding to vj is executed after vi.
This paper uses the CFG generation algorithm from a given program which was presented in [28].Before performing this algorithm, output graph is initialized as a global variable and contains only one vertex representing for the given program P.
Algorithm 2: GenerateCFG Input : P : given program Output: graph: CFG
1: B = a set of blocks by dividing P
2: G = a graph by linking all blocks in B to each other
3: update graph by replacing P with G 4:ifG contains return/break/continue statements then
5: update the destination of return/break/continue pointers in the graph 6: end if
7: for each block M in B do
8: if block M can be divided into smaller blocks then
9: GenerateCFG(M) 10: end if
11: end for
D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38 33
Apply this GenerateCFG algorithm to the above mentioned PUT getDayNum, we will get a CFG which has 5 test paths (presented by decision nodes) as Figure 2 following.
2) Test paths generation:In order to generate test data, a set of feasible test paths is found by traversing the given CFG. Path and test path are defined as follows:
Definition 2 (Path).Given a CFG G = (V, E), a path is a sequence of vertices {v0, v1,..., vk |(vi, vi+1)∈ E, 0< k < n}, where n is the number of vertices.
Definition 3 (Test path).Given a CFG G = (V, E), a test path is a path {v0, v1,..., vk |(vi, vi+1)∈ E}, where v0 and vi+1 are corresponding to the start vertex and end vertex of the CFG.
This research also uses CFG traverse algorithm [28] to obtain feasible test paths from a CFG as below:
6: for each adjacent vertex u to vdo 7: TraverseCFG(u, depth, path)
8: end for 9: end if
10: remove the latest vertex added in path from it
11: end if
In this paper, a test path is represented as a sequence of pairs of predicate, e.g. (month ≥ 1 &&month ≤ 12) for the first branch, and its decision (T or F for TRUE or FALSE respectively). For example, one of the paths in PUT getDayNumcan be written as thesequence {[(month ≥ 1 &&month ≤ 12), T], [(month = 2), T], [(year % 400 = 0 ||(year % 4 = 0 &&year %100 = 0)), F]} which means the TRUE branch is taken at predicate (month ≥ 1 &&month ≤ 12), the TRUE branch at predicate (month = 2),
and the FALSE branch at predicate (year % 400 = 0 ||(year % 4 = 0 &&year % 100 = 0)). This is the path taken with data that represents the number of days of February in the not leap year. Apply this algorithm TraverseCFG to the CFG of PUT getDayNum, we will get 5 test paths which are presented as a sequence of pairs of branch predication and its decisions as in the Table 2 below:
Table 2. All test paths of PUT getDayNum
Figure 2. CFG of PUT getDayNum.
Algorithm 3: TraverseCFG
Input: v: the initial vertex of the CFG depth: the maximum number of iterations for a loop
path: a global variable used to store a discovered test path
Output: P: a set of feasible test paths
1: ifv = NULL or v is the end vertex then 2: add path to P
3: else if the number occurrences of v in path ≤ depththen
4: add v to the end of path
5: if (v is not a decision node) or (v is decision node and path is feasible) then
PathID Path’s branch predications and their decisions
path1 [(month ≥ 1 &&month ≤ 12), T], [(month = 2), T],
[(year % 400 = 0 | | (year % 4 = 0 &&year % 100 = 0)), T]
path2 [(month ≥ 1 &&month ≤ 12), T], [(month = 2), T], [(year % 400 = 0 || (year % 4 = 0 &&year % 100 = 0)), F]
path3 [(month ≥ 1 &&month ≤ 12), T], [(month = 2), F], [(month= 4|| month= 6|| month= 9 || month= 11), T]
path4 [(month ≥ 1 &&month ≤ 12), T], [(month =2), F], [(month= 4|| month= 6|| month= 9 || month=11), F]
path5 [(month ≥ 1 &&month ≤ 12), F]
34 D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38
4.2. Establish fitness function for each test path
From the branch distance calculation formula in Table 1, we develop the below function.
fBchDist to calculate the value at each predicate branch.
Since each test path is represented by sequence of pairs of branch predication and its decision, in order to build the fitness function
for the test path, we establish the fitness function for each branch predication and its decision. There will be 2 possibilities of TRUE(T) and FALSE(F) for each branch predication, so there will be 2 fitness functions corresponding to those possibilities. Regarding the calculation formula for the fitness function of each branch predication, we apply the above mentioned branch distance calculation algorithm.
Table 3. Fitness function for each branch predication and its decision of PUT getDayNum
Decision node
[(month ≥ 1 &&month≤ 12), T] [(month ≥ 1 &&month ≥ 12), F]
[(month = 2), T] [(month = 2), F] [(year % 400 = 0 ||
(year % 4 = 0 && year % 100 = 0)), T]
[(year %400 = 0 || (year % 4 = 0 && year % 100 = 0)), F]
[(month= 4 || month= 6 || month= 9 || month= 11), T] [(month= 4 || month= 6 || month= 9 || month= 11), F]
o
Fitness function ID fBchDist(month, ≥, 1) + fBchDist (month, ≤, 12) f1T min(fBchDist(month, <, 1), f1F fBchDist(month, >, 12))
fBchDist(month, =, 2) f2T fBchDist(month, ≠, 2) f2F min(fBchDist(year%400, =, 0), f3T (fBchDist(year%4, =, 0) +
fBchDist(year%100, =, 0)))
fBchDist(year %400, ≠, 0) + f3F min(fBchDist(year %4, ≠, 0),
fBchDist (year %100, ≠, 0))
min(fBchDist(month, =, 4), fBchDist(month, =, 6), f4T fBchDist(month, =, 9), fBchDist(month, =, 11)) fBchDist(month, ≠, 4) + fBchDist(month, ≠, 6) + f4F fBchDist(month, ≠, 9) + fBchDist(month, ≠, 11)
Algorithm 4: Branch distance function (fBchDist) Input: double a, condition type, double b Output:branch distance value
1: switch (condition type) 2: case “=”:
3: if abs(a − b) = 0 then retrun 0 else return abs(a − b) + k)
4: case “≠”:
5: if abs(a − b)≠0 then return 0 else return k
6: case “<”:
7: if a − b <0 then return 0 else return (abs(a − b) + k)
8: case “≤”:
9: if a − b ≤ 0 then return 0 else return (abs(a − b) + k)
10: case “>”:
11: if b − a >0 then return 0 else return (abs(b − a) + k)
12: case “≥”:
13 if b − a ≥ 0 then return 0 else return (abs(b − a) + k)
14: end switch
Base onthese formulas, forcalculating fitness value for each branch predication, we generate the fitness function for each test path of the PUT getDayNum as below:
Table 4. Fitness functions for each test path of PUT getDayNum
PathID Test path fitness functions path1 F1 = f1T + f2T + f3T
path2 F2 = f1T + f2T + f3F path3 F3 = f1T + f2F + f4T path4 F4 = f1T + f2F + f4F path5 F5 = f1F
D.N. Thi / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 28-38 35
4.3. Apply multithreading of Particle Swarm Optimization
With each fitness function of each test path, we use one PSO to find its solution (in this case the solution means the test data which can cover the corresponding test path). In order to find the solution for all fitness functions at the same time, we perform simultaneous multithreading of the PSO algorithm by defining PSO it as 1 class extends Thread class of Java as follows: public class PSOProcess extends Thread
The multithreading of PSO can be executed through below algorithm:
Algorithm 5: Multithreading of Particle Swarm Optimization(MPSO)
Input: list of fitness functions
Output:the set of test data that is solution to cover corresponding test path
1: for each fitness function Fi
2: initialize an object psoi of class PSOProcess
3: assign a fitness function Fi to object psoi 4: execute object pso: pso.start();
5: end for
The experimental results of the above steps gave the results that our proposal has generated test data which covered all test paths of PUTgetDayNum:
Figure 4. Generated test data for the PUT getDayNum.
**