HƯỚNG DẪN DOWNLOAD TÀI LIỆU

Bước 1:Tại trang tài liệu slideshare.vn bạn muốn tải, click vào nút Download màu xanh lá cây ở phía trên.

Bước 2: Tại liên kết tải về, bạn chọn liên kết để tải File về máy tính. Tại đây sẽ có lựa chọn tải File được lưu trên slideshare.vn

Bước 3: Một thông báo xuất hiện ở phía cuối trình duyệt, hỏi bạn muốn lưu . - Nếu click vào Save, file sẽ được lưu về máy (Quá trình tải file nhanh hay chậm phụ thuộc vào đường truyền internet, dung lượng file bạn muốn tải)

Có nhiều phần mềm hỗ trợ việc download file về máy tính với tốc độ tải file nhanh như: Internet Download Manager (IDM), Free Download Manager, ... Tùy vào sở thích của từng người mà người dùng chọn lựa phần mềm hỗ trợ download cho máy tính của mình

VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
A Novel Combination of Negative and Positive Selection in Artiﬁcial Immune Systems
Van Truong Nguyen1, Xuan Hoai Nguyen2, Chi Mai Luong3
1Thai Nguyen University of Education, Thai Nguyen, Vietnam 2Hanoi University, Hanoi, Vietnam
3Vietnamese Academy of Science and Technology, Hanoi, Vietnam
Abstract
Artiﬁcial Immune System (AIS) is a multidisciplinary research area that combines the principles of immunology andcomputation. NegativeSelectionAlgorithms(NSA)isoneofthemostpopularmodelsofAISmainlydesigned forone-classlearningproblemssuchasanomalydetection. PositiveSelectionAlgorithms(PSA)isthetwinbrother of NSA with quite similar performance for AIS. Both NSAs and PSAs comprise of two phases: generating a set D of detectors from a given set S of selves (detector generation phase); and then detecting if a given cell (new data instance) is self or non-self using the generated detector set (detection phase). In this paper, we propose a novel approach to combining NSAs and PSAs that employ binary representation and r-chunk matching rule. The new algorithm achieves smaller detector storage complexity and potentially better detection time in comparison with single NSAs or PSAs.
2015 Published by VNU Journal of Science.
Manuscript communication: received 17 February 2014, revised 01 March 2015, accepted 25 March 2015 Corresponding author: Van Truong Nguyen, nvtruongtn@gmail.com
Keywords: Artiﬁcial immune systems, Negative selection, Positive selection, Intrusion detection, Detector
1. Introduction infected pathogens. The learning process of
The biological immune system is a mature defense system which has evolved over millions of years. As a defense system, it is incredibly robust, adaptive, and inherently distributed. The immune system posses powerful pattern recognition, learning, and memory capabilities. It has evolved complex methods for combating infections caused by viruses and other pathogens, without apparently any central coordination or control. Its ability to distinguish between pathogens and non-pathogens has inspired a number of artiﬁcial immune systems on computers [1]. The representative immune cell is the T cell, which has a self-recognition component and an antigen receptor for locating and eliminating
the biological immune system does not require negative examples and acquired knowledge is represented in an explicit form: T cells are generated randomly and in a large number, in the hope that every pathogen that might infect the host could be detected by at least some of these cells. However, the host must ensure that no cell generated would turn against itself (autoimmune reactions). Hence, newborn T cells undergo the process of selection to ensure that they are able to recognize non-self peptides. This process might be conducted in two ways: positive selection and negative selection. In negative selection, if a T cell detects any self protein, it is discarded; otherwise, it is kept. By contrast, in positive selection, if a T cell fails to recognize any of the self proteins, it is removed [2].
V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 23
Negative selection algorithms (NSA) the detector generation phase (Figure. 1.a), and positive selection algorithm (PSA) are the detector candidates are generated by some
computational models that have been inspired by negative and positive selection of the biological
random processes and censored by matching them against given self samples taken from a
immune system. Among the two, NSA has set S (representing the system components). been studied more extensively resulting in more The candidates that match any element of S variants and applications [3]. However, all of are eliminated and the rest are kept and stored existing NSAs have worst-case exponential in the detector set D. In the detection phase
memory complexity for storing the detector set, hence, limit their practical applicabilities [4]. In thispaper,weproposeanovelselectionalgorithm that employs binary representation and r-chunk matching rule for detectors. The new algorithm
(Figure. 1.b), the collection of detectors are used to distinguish self (system components) from non-self (outlier, anomaly, etc). If incoming data instance matches any detector, it is claimed as non-self or anomaly.
combines negative and positive selection to
reduce both detector storage complexity and detection time, while maintaining the same detection coverage as that of NSAs (PSAs).
Begin
Generate Random Candidates
Begin
Input new samples
The rest of the paper is organized as follows. In the next section, we present the background on PSAs, NSAs and r-chunk matching rule for detectors. Section 3 brieﬂy describes the work in the literature that are most related to our new approach. Section 4 details our new selection algorithm with theoretically proven results on detector storage optimization and preliminary
Yes Match self samples?
No
Accept as new detector
No Enough detectors?
Yes
End
Match any detector?
No Yes
Nonself Self
End
experimental results on detection time. Section (a) Generation of detector set (b) Detection of new instances
5 concludes the paper and discuss some possible future work.
Themaincontributionsofthispaper,compared to our previous work [5], are three-folds: a more general proof of detector storage complexity, an extension of related works, and an experiment of our algorithm on real network intrusion dataset.
Fig. 1: Outline of a typical negative selection algorithm.
Since its introduction, NSA has had many applications such as in computer virus detection [8][9], monitoring UNIX processes [10], anomaly detection in time series [11], intrusion detection [2],
2. Background
scheduling [12], fault detection and diagnosis [13].
In this section we ﬁrst brieﬂy describe NSAs
and PSAs. Then, the r-chunk matching rule is 2.2. Positive Selection Algorithms
deﬁned and discussed. Contrary to NSAs, PSAs have been less
studied in the literature. They are mainly 2.1. Negative Selection Algorithms developedandappliedinintrusiondetection[14],
NSAs are among the most popular and spam detection [15], and classiﬁcation [16]. extensively studied techniques in artiﬁcial Stibor et al. [1] argued that positive selection immune systems that simulate the negative might have better detection performance than selection process of the biological immune negative selection. However, for problems system. A typical NSA comprises of two phases: and applications that the number of detectors detector generation and detection [6, 7]. In generated by negative selection algorithms is
24 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
much less than the number of self samples, negativeselectionisobviouslyabetterchoice[3]. Similar to NSA, a PSA contains two phases: detector generation and detection. In the detector
which helps AIS to achieve better results on data whereadjacentregionsoftheinputdatasequence are not necessarily semantically correlated, such as in network data packets [18].
generation candidates
phase (Figure. are generated
2.a), the by some
detector random
We denote = f0;1g as the alphabet for detectors and data. Let s 2 ‘ be a binary string,
processes and matched against the given self sample set S. The candidates that do not match
‘ = jsj is the length of s and s[i;:::; j] is the substring of s that starts at position i with length
any element in S are eliminated and the rest j i+1. Positive and negative r-chunk detectors are kept and stored in the detector set D. In could be deﬁned as follows:
the detection phase (Fig. 2.b), the collection of detectors are used to distinguish self from non-self. If incoming data instance matches any detector, it is claimed as self.
Deﬁnition 1 (Positive r-chunk detectors). Given a self set S, a tuple (d;i) of a string d 2 r, r ‘, and an integer i 2 f1;:::;‘ r + 1g is called a positive r-chunk detector if there exists a s 2 S
Begin
Generate Random Candidates
No
Match self samples?
Yes
Accept as new detector
No Enough detectors?
Begin
Input new samples
Match any detector?
Yes No
Nonself Self
such that d matches s[i;:::;i+r 1].
Deﬁnition2 (Negative r-chunkdetectors). Given a self set S, a tuple (d;i) of a string d 2 r, r ‘, and an integer i 2 f1;:::;‘ r + 1g is called a negative r-chunk detector if d does not match any s[i;:::;i+r 1], s 2 S.
We also use the following notations:
Yes End
(a) Generation of detector set
End
(b) Detection of new instances
Dpi = f(d;i), (d;i) is a positive r-chunk detectorg is set of all positive r-chunk detectors at position i, i = 1;:::;‘ r +1.
Fig. 2: Outline of a typical positive selection algorithm.
2.3. Positive and Negative r-chunk Detectors
In PSAs and NSAs, an essential component is the matching rule which determines the similarity between detectors and self samples (in the detector generation phase) and coming data instances (in the detection phase). Obviously, the matching rule is dependent on detector representation. In this paper, we assume binary representation for all detectors and data. Binary representation is the most simple and popular representation for detectors and data in AIS, and
Dni = f(d;i), (d;i) is a negative r-chunk detectorg is set of all negative r-chunk detectors at position i, i = 1;:::;‘ r +1.
Dp = [‘ r+1Dpi is set of all positive r-chunk detectors.
Dn = [‘ r+1Dni is set of all negative r-chunk detectors.
Foragivendetectorset X,SX and NX arethe sets of self and non-self strings detected by X, respectively.
Example 1. Let ‘ = 5, matching threshold r =
other representations (such as real valued) could 3. Suppose that we have the set of six self
be reduced to binary [17, 3]. For binary based AIS, the r-chunk and r-contiguous detectors are among the most common matching rules. An r-chunk matching rule can be seen as a generalisation of the r-contiguous matching rule,
strings S = fs1 = 00000;s2 = 00010;s3 = 10110;s4 = 10111;s5 = 11000;s6 = 11010g. Dp1 = f(000,1); (101,1); (110,1)g (Dp1 is set of all leftmost substring of length ‘ of s, s 2 S), Dn1 = f(001,1); (010,1); (011,1); (100,1); (111,1)g,
V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 25
Dp2 = f(000,2); (001,2); (011,2); (100,2); (101,2)g, Dn2 = f(010,2); (110,2); (111,2)g, Dp3 = f(000,3); (010,3); (110,3); (111,3)g, Dn3 = f(001,3); (011,3); (100,3); (101,3)g (note that Dpi [ Dni = 3, i = 1, 2, 3). The self space covered by the set of all positive 3-chunk detectorsisSDp =f00000;00001;00010;00011; 00110; 00111; 01000; 01001; 01010; 01011; 01110; 01111; 10000; 10001; 10010; 10011; 10100; 10101; 10110; 10111; 11000; 11001; 11010; 11011; 11110; 11111g. The non-self strings detected by the set of all negative 3-chunk detectorsisNDn =f00001;00011;00100;00101; 00110; 00111; 01000; 01001; 01010; 01011; 01100; 01101; 01110; 01111; 10000; 10001; 10010; 10011; 10100; 10101; 11001; 11011; 11100; 11101; 11110; 11111g.
It could be seen from Example 1 that NDp = 5nSDp = f00100; 00101; 01100; 01101; 11100; 11101g , NDn, so the detection coverage of Dn is not the same as that of Dp. This is
undesirableforthecombinationofPSAandNSA. Hence,tocombinepositiveandnegativeselection algorithms in an uniﬁed framework, we have to change the semantics of positive selection in the detection phase as follows.
Deﬁnition 3 (Detection in positive selection). If new instance matches ‘ r + 1 positive r-chunk detectors (dij;i), i = 1;:::;‘ r +1, it is claimed as self, otherwise it is claimed as non-self.
With the new detection semantics, the following proposition on the equivalence of detection coverage of r-chunk type PSA and NSA could be stated.
Proposition 1 (Detection Coverage). The detection coverage of positive and negative selection algorithms coincide.
otherwise it is claimed as self. Obviously, it is dual to the detection of new data instances in positive selection as given in Deﬁnition 3.
This proposition lays the foundation for our novel Positive-Negative Selection Algorithm (PNSA) proposed in Section 4.
3. Related Works
Both PSA and NSA achieve quite similar performance for detecting novelty in data patterns [19]. Dasgupta D. et al. [20] conducted one of the earliest experiments on combining positive with negative selection. The combined process is embedded in a genetic algorithm using a ﬁtness function that assigns a weight to each bit based on the domain knowledge. Their method is neither aimed to reduce detector storage complexity nor detection time. Esponda et al. [21] proposed a generic NSA for anomaly detection problems. Their model of normal behavior is constructed from an observed sample of normally occurring patterns. Such a model could represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection/selection). However, their NSA is not concerned with the combination of positive and negative selection in detection phase as in PNSA. Stibor et al. [1] argued that positive selection might have better detection performance than negative selection. However, the choice between positive selection algorithms and negative ones obviously depends on representation of the AIS-based applications. An example in Section 4 shows that some positive trees are more compact than the others and vice versa.
To the best of our knowledge, there has not beenanypublishedattemptincombiningr-chunk type PSA and NSA for the purpose of reducing
NDp = NDn
SDp = SDn
(1) detector storage complexity and real/average detection time complexity.
(2)
Proof. From the description of NSAs (see 4. New Positive-Negative Selection Algorithm Fig. 1), if a new data instance matches a negative It can be seen from Section 2 that the positive r-chunk detector, then it is claimed as non-self, and negative selection are dual. This motivates
26 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
our approach to combining the two mechanisms. In this section, a new r-chunk type NSA is proposed that is the combination of positive and negative selection.
0 1 0 1
0 0 1 0 1 0 1
0 1 0 1 0 1
In our proposed approach, binary trees are (a) (b)
used as data structure for storing the detector set to reduce memory complexity, and therefore the time complexity of the detection phase. To build and store the detection set, our algorithm ﬁrst
0 1 0 1
0 1 0 1 1
0 1 1 0 1 0
constructs ‘ r + 1 binary trees (called positive trees) corresponding to ‘ r +1 positive r-chunk detector set Dpi, i = 1;:::;‘ r + 1. Then, all complete subtrees of these trees are removed to achieve a compact representation of the positive r-chunk detector set while maintaining the detection coverage. Finally, for every ith
(c) (d)
0 1 0 1
0 1 1 0 1 0
0 0 0 1 1 1
(e) (f)
positive trees, we decide whether or not it should be converted to the negative tree, which covers the negative r-chunk detector set Dni. The decision depends on which tree is more compact. When this process is done, we have ‘ r + 1 compact binary trees that some of them represent positive r-chunk detectors and the others represent negative ones.
The r-chunk matching rule on binary trees is implemented as follows: a given sample s matches the positive (negative) tree ith if s[i;:::;i + k] is a path from the root to a leaf, i = 1;:::;‘ r + 1, k < r. The detection phase can be conducted by traveling the compact binary trees iteratively one by one: a sample s is claimed as non-self if it matches a negative tree or it does not match all positive trees, otherwise it is considered as self.
Example 2. For the set of self strings S from Example 1, where ‘ = 5 and r = 3, the six binary trees (the left and right child are labeled 0 and 1 respectively) represent the detector set of six 3-chunk detectors (Dpi and Dni, i = 1;2;3) as depicted in Figure 3. In the Figure, dashed arrows in some positive trees mark the complete subtrees that will be removed to achieve compact tree representation.
ThenumberofnodesofthetreesinFigures3.a - 3.f (after deleting complete subtrees) are 9,
Fig. 3: Binary tree representation of the detector set generated from S deﬁned in Example 1. The positive trees for Dp1, Dp2 and Dp3 are in (a), (c) and (e), respectively; The negative trees for Dn1, Dn2 and Dn3 are in (b), (d) and (f), respectively.
10, 7, 6, 8 and 8, respectively. Therefore, the chosen ﬁnal trees are those in Figures 3.a (9 nodes), 3.d (6 nodes) and 3.e or 3.f (8 nodes). In real implementation, it is unnecessary to generate both positive and negative trees. Since each Dpi could dually be represented either by a positive or a negative tree, we only need to generate (compact) positive trees. If a compact positive tree T has more number of leaves than the number of internal nodes that have single child, the corresponding negative tree NT should have less number of nodes than T. Therefore, NT shouldbeusedinsteadofT torepresent Dni more compactly. It is noted that NT could be obtained from T by taking the dual links (paths) in T. The following example illustrates this observation.
Example 3. Consider again the set of self strings S from Example 1. The compact positive tree for the positive 3-chunk detector set Dp2 = f(000,2); (001,2); (011,2); (100,2); (101,2)g is shown in Fig. 4.a. This tree has three leaves and two nodes that have only one child (in dotted circles) so it should be converted to the corresponding negative tree as illustrated in Fig. 4.b.
V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 27
0 1 0 1
0 1 0 1 1
could potentially lead to better detection time complexity in real and average cases. To see this, ﬁrst, let the following theorem be stated:
1 0 Theorem 1 (PNSA detector storage complexity).
(a) (b)
Fig. 4: Conversion of a positive tree to a negative one.
Algorithm 38 summarizes the overall PNSA. In the algorithm, the process of generating compact binary (positive and negative) trees representing the complete r-chunk detector set is conductedintheouter“for”loop. First,allbinary positive tree Ti are constructed by the ﬁrst inner loop. Then, the compactiﬁcation of all Ti are conducted by the second one, i = 1;:::;‘ r+1. The conversion of a positive tree to negative one takes place in “if” statement after the second inner “for” loop. The procedure for recognizing a given cell string s as self or non-self, is carried out by the last “while ::: do” and “if ::: then ...else” statements.
The detection phase of PNSA could be illustrated by the following example.
Example 4. Given S, r as in Example 1, and s = 10100 as the inputs of the algorithm, three binary trees are constructed as the detector set in Figures 3.a, 3.d. and 3.e. The output of the algorithm is “s is non-self” because all
the paths of tree T2 do not contain substring s[2,...,4] = 010 of s.
From the description of PNSA, it is trivial
Given a self set S and an integer ‘, the PNSA produces the detector (binary) tree set that have at most total (‘ r+1):2r 2 less number of nodes in comparison to the detector tree set created by a PSA or NSA only, where r 2 f2;:::;‘ r +1g.
Proof. We only prove the theorem for the PSA case, the NSA case can be proven in a similar way. Because there are (‘ r + 1) positive trees can be build from the self set S, so the theorem is proved if it can reduce at most 2r 2 nodes from a positive tree.
The theorem is proved by induction on r (also the height of binary trees).
It is noted that when converting a positive tree to a negative tree as in PNSA, the reduction in number of nodes is exactly as the result of the subtraction of number of leaf nodes from the numberofinternalnodesthathaveonlyonechild. When r = 2, there are 16 trees of possible positive trees are of height 2. By examining all 16 cases, we have found that the maximum reduction in number of nodes is 1. One example of these cases is the positive tree that has 2 leaf nodes after compactiﬁcation as in Fig. 5.a. Since it has two leaf nodes and one one-child internal node, after being converted to the corresponding negative tree, the number of nodes is reduced by
2 - 1 = 1.
to show that it takes jSj(‘ r + 1):r steps to 0 1 1
generate all necessary trees (detector generation time complexity) and (‘ r + 1):r steps to verify
0 1 0 1
a cell string as self or non-self in the worst case (worse-case detection time complexity). These
(a) (b)
time complexities are similar to the state-of-the-art NSAs (PSAs) such as the one proposed in [4]. However, by using compact positive and negative binary trees for storing the detector set, PNSA could reduce the storage complexity of the detector set in comparison with the other r-chunk type single NSAs or PSAs that store detectors as binary strings. This storage complexity reduction
Fig. 5: One node is reduced in a tree: a compact positive tree has 4 nodes (a) and its conversion (a negative tree) has 3 node (b).
Suppose that the theorem’s conclusion is right for all r < k. We shall prove that it is also right for k. This is done by an observation that in all positive trees that are of height k, there is at least
28 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
Algorithm 1: PNSA (Positive-Negative Selection Algorithm)
Data: a self set S, an integer r 2 f1;:::;‘ r +1g a cell string s to be detected.
Result: detection of s as self or non-self. 1 for i = 1;:::;‘ r +1 do
2 initialize an empty binary self tree Ti 3 for each s 2 S do
4 insert s[i;:::;r i+1] into Ti 5 end
6 for every internal node n 2 Ti do