A novel combination of negative and positive selection in Artificial Immune System

Đăng ngày 4/25/2019 3:41:43 AM | Thể loại: | Lần tải: 0 | Lần xem: 15 | Page: 10 | FileSize: 0.16 M | File type: PDF
A novel combination of negative and positive selection in Artificial Immune System. In this paper, we propose a novel approach to combining NSAs and PSAs that employ binary representation and r-chunk matching rule. The new algorithm achieves smaller detector storage complexity and potentially better detection time in comparison with single NSAs or PSAs.
c
VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
A Novel Combination of Negative and Positive Selection in
Artificial Immune Systems
Van Truong Nguyen1, Xuan Hoai Nguyen2, Chi Mai Luong3
1Thai Nguyen University of Education, Thai Nguyen, Vietnam
2Hanoi University, Hanoi, Vietnam
3Vietnamese Academy of Science and Technology, Hanoi, Vietnam
Abstract
Artificial Immune System (AIS) is a multidisciplinary research area that combines the principles of immunology
andcomputation. NegativeSelectionAlgorithms(NSA)isoneofthemostpopularmodelsofAISmainlydesigned
forone-classlearningproblemssuchasanomalydetection. PositiveSelectionAlgorithms(PSA)isthetwinbrother
of NSA with quite similar performance for AIS. Both NSAs and PSAs comprise of two phases: generating a set D
of detectors from a given set S of selves (detector generation phase); and then detecting if a given cell (new data
instance) is self or non-self using the generated detector set (detection phase). In this paper, we propose a novel
approach to combining NSAs and PSAs that employ binary representation and r-chunk matching rule. The new
algorithm achieves smaller detector storage complexity and potentially better detection time in comparison with
single NSAs or PSAs.
2015 Published by VNU Journal of Science.
Manuscript communication: received 17 February 2014, revised 01 March 2015, accepted 25 March 2015
Corresponding author: Van Truong Nguyen, nvtruongtn@gmail.com
Keywords:
Artificial immune systems, Negative selection, Positive selection, Intrusion detection, Detector
1. Introduction
infected
pathogens.
The
learning
process
of
the biological immune system does not require
The biological immune system is a mature
defense system which has evolved over millions
of years. As a defense system, it is
incredibly robust, adaptive, and inherently
distributed. The immune system posses powerful
pattern recognition, learning, and memory
capabilities. It has evolved complex methods
for combating infections caused by viruses
and other pathogens, without apparently any
central coordination or control. Its ability
to distinguish between pathogens and non-
pathogens has inspired a number of artificial
immune systems on computers [1]. The
representative immune cell is the T cell,
which has a self-recognition component and an
antigen receptor for locating and eliminating
negative examples and acquired knowledge is
represented in an explicit form: T cells are
generated randomly and in a large number, in the
hope that every pathogen that might infect the
host could be detected by at least some of these
cells. However, the host must ensure that no cell
generated would turn against itself (autoimmune
reactions). Hence, newborn T cells undergo the
process of selection to ensure that they are able to
recognize non-self peptides. This process might
be conducted in two ways: positive selection and
negative selection. In negative selection, if a
T cell detects any self protein, it is discarded;
otherwise, it is kept. By contrast, in positive
selection, if a T cell fails to recognize any of the
self proteins, it is removed [2].
V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
23
Negative
selection
algorithms
(NSA)
the
detector
generation
phase
(Figure.
1.a),
and
positive
selection
algorithm
(PSA)
are
the detector candidates are generated by some
computational models that have been inspired by
random
processes
and
censored
by
matching
negative and positive selection of the biological
them against given self samples taken from a
immune
system.
Among
the
two,
NSA
has
set
S
(representing
the
system
components).
been studied more extensively resulting in more
The
candidates
that
match
any
element
of
S
variants and applications [3].
However, all of
are eliminated and the rest are kept and stored
existing
NSAs
have
worst-case
exponential
in the detector set D.
In the detection phase
memory complexity for storing the detector set,
(Figure. 1.b), the collection of detectors are used
hence, limit their practical applicabilities [4]. In
to
distinguish
self
(system
components)
from
thispaper,weproposeanovelselectionalgorithm
non-self (outlier, anomaly, etc). If incoming data
that employs binary representation and r-chunk
instance matches any detector, it is claimed as
matching rule for detectors.
The new algorithm
non-self or anomaly.
combines
negative
and
positive
selection
to
reduce
both
detector
storage
complexity
and
Begin
Begin
detection time, while maintaining the same
detection coverage as that of NSAs (PSAs).
Generate Random Candidates
Input new samples
The rest of the paper is organized as follows.
In the next section, we present the background
on PSAs, NSAs and r-chunk matching rule for
detectors. Section 3 briefly describes the work
Yes
Match self samples?
No
Accept as new detector
Match any detector?
Yes
No
in the literature that are most related to our new
approach. Section 4 details our new selection
Enough detectors?
No
Nonself
Self
algorithm with theoretically proven results on
detector storage optimization and preliminary
Yes
End
End
experimental results on detection time.
Section
(a) Generation of detector set
(b) Detection of new instances
5 concludes the paper and discuss some possible
future work.
Fig. 1: Outline of a typical negative selection algorithm.
Themaincontributionsofthispaper,compared
to our previous work [5], are three-folds: a more
general proof of detector storage complexity, an
extension of related works, and an experiment of
our algorithm on real network intrusion dataset.
Since its introduction, NSA has had
many applications such as in computer
virus detection [8][9], monitoring UNIX
processes [10], anomaly detection in
time series [11], intrusion detection [2],
2. Background
scheduling [12],
diagnosis [13].
fault
detection
and
In this section we first briefly describe NSAs
and PSAs. Then, the r-chunk matching rule is
2.2. Positive Selection Algorithms
defined and discussed.
Contrary
to
NSAs,
PSAs
have
been
less
2.1. Negative Selection Algorithms
studied in the literature. They are mainly
developedandappliedinintrusiondetection[14],
NSAs
are
among
the
most
popular
and
spam
detection
[15],
and
classification
[16].
extensively
studied
techniques
in
artificial
Stibor et al. [1] argued that positive selection
immune
systems
that
simulate
the
negative
might
have
better
detection
performance
than
selection
process
of
the
biological
immune
negative
selection.
However,
for
problems
system. A typical NSA comprises of two phases:
and applications that the number of detectors
detector
generation
and
detection
[6,
7].
In
generated
by
negative
selection
algorithms
is
i=1
i=1
24
V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31
much
less
than
the
number
of
self
samples,
which helps AIS to achieve better results on data
negativeselectionisobviouslyabetterchoice[3].
whereadjacentregionsoftheinputdatasequence
Similar to NSA, a PSA contains two phases:
are not necessarily semantically correlated, such
detector generation and detection. In the detector
as in network data packets [18].
generation
phase
(Figure.
2.a),
the
detector
We denote 
=
f0;1g as the alphabet for
candidates
are
generated
by
some
random
detectors and data. Let s 2 be a binary string,
processes
and
matched
against
the
given
self
=
jsj is the length of s and s[i;:::; j] is the
sample set S. The candidates that do not match
substring of s that starts at position i with length
any element in S
are eliminated and the rest
j i+1. Positive and negative r-chunk detectors
are kept and stored in the detector set D.
In
could be defined as follows:
the detection phase (Fig. 2.b), the collection of
detectors are used to distinguish self from non-
self. If incoming data instance matches any
detector, it is claimed as self.
Definition 1 (Positive r-chunk detectors). Given
a self set S, a tuple (d;i) of a string d 2 r, r  ,
and an integer i 2 f1;:::;‘ r + 1g is called a
positive r-chunk detector if there exists a s 2 S
Begin
Begin
such that d matches s[i;:::;i+r 1].
Generate Random Candidates
No
Input new samples
Definition2 (Negative r-chunkdetectors). Given
a self set S, a tuple (d;i) of a string d 2 r, r  ,
Match self samples?
and an integer i 2 f1;:::;‘ r + 1g is called a
Yes
Match any detector?
negative r-chunk detector if d does not match any
Accept as new detector
No
Yes
s[i;:::;i+r 1], s 2 S.
Enough detectors?
No
Nonself
Self
We also use the following notations:
Yes
End
End
 Dpi = f(d;i), (d;i) is a positive r-chunk
detectorg is set of all positive r-chunk
(a) Generation of detector set
(b) Detection of new instances
detectors at position i, i = 1;:::;‘ r +1.
Fig. 2: Outline of a typical positive selection algorithm.
 Dni
=
f(d;i), (d;i) is a negative r-chunk
detectorg
is
set
of
all
negative
r-chunk
2.3. Positive and Negative r-chunk Detectors
detectors at position i, i = 1;:::;‘ r +1.
In PSAs and NSAs, an essential component
is the matching rule which determines the
 Dp = [ r+1Dpi is set of all positive r-
chunk detectors.
similarity between detectors and self samples (in
the detector generation phase) and coming data
instances (in the detection phase). Obviously,
 Dn = [ r+1Dni is set of all negative r-
chunk detectors.
the matching rule is dependent on detector
representation. In this paper, we assume binary
representation for all detectors and data. Binary
representation is the most simple and popular
 Foragivendetectorset X,SX and NX arethe
sets of self and non-self strings detected by
X, respectively.
representation for detectors and data in AIS, and
Example 1. Let = 5, matching threshold r =
other representations (such as real valued) could
3.
Suppose that we have the set of six self
be reduced to binary [17, 3].
For binary based
strings S
= fs1
=
00000;s2
=
00010;s3
=
AIS,
the
r-chunk
and
r-contiguous
detectors
10110;s4
=
10111;s5
=
11000;s6
=
11010g.
are among the most common matching rules.
Dp1 = f(000,1); (101,1); (110,1)g (Dp1 is set of
An
r-chunk
matching
rule
can
be
seen
as
a
all leftmost substring of length of s, s 2 S), Dn1
generalisation of the r-contiguous matching rule,
= f(001,1); (010,1); (011,1); (100,1); (111,1)g,
HƯỚNG DẪN DOWNLOAD TÀI LIỆU

Bước 1:Tại trang tài liệu slideshare.vn bạn muốn tải, click vào nút Download màu xanh lá cây ở phía trên.
Bước 2: Tại liên kết tải về, bạn chọn liên kết để tải File về máy tính. Tại đây sẽ có lựa chọn tải File được lưu trên slideshare.vn
Bước 3: Một thông báo xuất hiện ở phía cuối trình duyệt, hỏi bạn muốn lưu . - Nếu click vào Save, file sẽ được lưu về máy (Quá trình tải file nhanh hay chậm phụ thuộc vào đường truyền internet, dung lượng file bạn muốn tải)
Có nhiều phần mềm hỗ trợ việc download file về máy tính với tốc độ tải file nhanh như: Internet Download Manager (IDM), Free Download Manager, ... Tùy vào sở thích của từng người mà người dùng chọn lựa phần mềm hỗ trợ download cho máy tính của mình  
15 lần xem

A novel combination of negative and positive selection in Artificial Immune System. In this paper, we propose a novel approach to combining NSAs and PSAs that employ binary representation and r-chunk matching rule. The new algorithm achieves smaller detector storage complexity and potentially better detection time in comparison with single NSAs or PSAs..

Nội dung

VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 A Novel Combination of Negative and Positive Selection in Artificial Immune Systems Van Truong Nguyen1, Xuan Hoai Nguyen2, Chi Mai Luong3 1Thai Nguyen University of Education, Thai Nguyen, Vietnam 2Hanoi University, Hanoi, Vietnam 3Vietnamese Academy of Science and Technology, Hanoi, Vietnam Abstract Artificial Immune System (AIS) is a multidisciplinary research area that combines the principles of immunology andcomputation. NegativeSelectionAlgorithms(NSA)isoneofthemostpopularmodelsofAISmainlydesigned forone-classlearningproblemssuchasanomalydetection. PositiveSelectionAlgorithms(PSA)isthetwinbrother of NSA with quite similar performance for AIS. Both NSAs and PSAs comprise of two phases: generating a set D of detectors from a given set S of selves (detector generation phase); and then detecting if a given cell (new data instance) is self or non-self using the generated detector set (detection phase). In this paper, we propose a novel approach to combining NSAs and PSAs that employ binary representation and r-chunk matching rule. The new algorithm achieves smaller detector storage complexity and potentially better detection time in comparison with single NSAs or PSAs. 2015 Published by VNU Journal of Science. Manuscript communication: received 17 February 2014, revised 01 March 2015, accepted 25 March 2015 Corresponding author: Van Truong Nguyen, nvtruongtn@gmail.com Keywords: Artificial immune systems, Negative selection, Positive selection, Intrusion detection, Detector 1. Introduction infected pathogens. The learning process of The biological immune system is a mature defense system which has evolved over millions of years. As a defense system, it is incredibly robust, adaptive, and inherently distributed. The immune system posses powerful pattern recognition, learning, and memory capabilities. It has evolved complex methods for combating infections caused by viruses and other pathogens, without apparently any central coordination or control. Its ability to distinguish between pathogens and non-pathogens has inspired a number of artificial immune systems on computers [1]. The representative immune cell is the T cell, which has a self-recognition component and an antigen receptor for locating and eliminating the biological immune system does not require negative examples and acquired knowledge is represented in an explicit form: T cells are generated randomly and in a large number, in the hope that every pathogen that might infect the host could be detected by at least some of these cells. However, the host must ensure that no cell generated would turn against itself (autoimmune reactions). Hence, newborn T cells undergo the process of selection to ensure that they are able to recognize non-self peptides. This process might be conducted in two ways: positive selection and negative selection. In negative selection, if a T cell detects any self protein, it is discarded; otherwise, it is kept. By contrast, in positive selection, if a T cell fails to recognize any of the self proteins, it is removed [2]. V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 23 Negative selection algorithms (NSA) the detector generation phase (Figure. 1.a), and positive selection algorithm (PSA) are the detector candidates are generated by some computational models that have been inspired by negative and positive selection of the biological random processes and censored by matching them against given self samples taken from a immune system. Among the two, NSA has set S (representing the system components). been studied more extensively resulting in more The candidates that match any element of S variants and applications [3]. However, all of are eliminated and the rest are kept and stored existing NSAs have worst-case exponential in the detector set D. In the detection phase memory complexity for storing the detector set, hence, limit their practical applicabilities [4]. In thispaper,weproposeanovelselectionalgorithm that employs binary representation and r-chunk matching rule for detectors. The new algorithm (Figure. 1.b), the collection of detectors are used to distinguish self (system components) from non-self (outlier, anomaly, etc). If incoming data instance matches any detector, it is claimed as non-self or anomaly. combines negative and positive selection to reduce both detector storage complexity and detection time, while maintaining the same detection coverage as that of NSAs (PSAs). Begin Generate Random Candidates Begin Input new samples The rest of the paper is organized as follows. In the next section, we present the background on PSAs, NSAs and r-chunk matching rule for detectors. Section 3 briefly describes the work in the literature that are most related to our new approach. Section 4 details our new selection algorithm with theoretically proven results on detector storage optimization and preliminary Yes Match self samples? No Accept as new detector No Enough detectors? Yes End Match any detector? No Yes Nonself Self End experimental results on detection time. Section (a) Generation of detector set (b) Detection of new instances 5 concludes the paper and discuss some possible future work. Themaincontributionsofthispaper,compared to our previous work [5], are three-folds: a more general proof of detector storage complexity, an extension of related works, and an experiment of our algorithm on real network intrusion dataset. Fig. 1: Outline of a typical negative selection algorithm. Since its introduction, NSA has had many applications such as in computer virus detection [8][9], monitoring UNIX processes [10], anomaly detection in time series [11], intrusion detection [2], 2. Background scheduling [12], fault detection and diagnosis [13]. In this section we first briefly describe NSAs and PSAs. Then, the r-chunk matching rule is 2.2. Positive Selection Algorithms defined and discussed. Contrary to NSAs, PSAs have been less studied in the literature. They are mainly 2.1. Negative Selection Algorithms developedandappliedinintrusiondetection[14], NSAs are among the most popular and spam detection [15], and classification [16]. extensively studied techniques in artificial Stibor et al. [1] argued that positive selection immune systems that simulate the negative might have better detection performance than selection process of the biological immune negative selection. However, for problems system. A typical NSA comprises of two phases: and applications that the number of detectors detector generation and detection [6, 7]. In generated by negative selection algorithms is 24 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 much less than the number of self samples, negativeselectionisobviouslyabetterchoice[3]. Similar to NSA, a PSA contains two phases: detector generation and detection. In the detector which helps AIS to achieve better results on data whereadjacentregionsoftheinputdatasequence are not necessarily semantically correlated, such as in network data packets [18]. generation candidates phase (Figure. are generated 2.a), the by some detector random We denote = f0;1g as the alphabet for detectors and data. Let s 2 ‘ be a binary string, processes and matched against the given self sample set S. The candidates that do not match ‘ = jsj is the length of s and s[i;:::; j] is the substring of s that starts at position i with length any element in S are eliminated and the rest j i+1. Positive and negative r-chunk detectors are kept and stored in the detector set D. In could be defined as follows: the detection phase (Fig. 2.b), the collection of detectors are used to distinguish self from non-self. If incoming data instance matches any detector, it is claimed as self. Definition 1 (Positive r-chunk detectors). Given a self set S, a tuple (d;i) of a string d 2 r, r ‘, and an integer i 2 f1;:::;‘ r + 1g is called a positive r-chunk detector if there exists a s 2 S Begin Generate Random Candidates No Match self samples? Yes Accept as new detector No Enough detectors? Begin Input new samples Match any detector? Yes No Nonself Self such that d matches s[i;:::;i+r 1]. Definition2 (Negative r-chunkdetectors). Given a self set S, a tuple (d;i) of a string d 2 r, r ‘, and an integer i 2 f1;:::;‘ r + 1g is called a negative r-chunk detector if d does not match any s[i;:::;i+r 1], s 2 S. We also use the following notations: Yes End (a) Generation of detector set End (b) Detection of new instances Dpi = f(d;i), (d;i) is a positive r-chunk detectorg is set of all positive r-chunk detectors at position i, i = 1;:::;‘ r +1. Fig. 2: Outline of a typical positive selection algorithm. 2.3. Positive and Negative r-chunk Detectors In PSAs and NSAs, an essential component is the matching rule which determines the similarity between detectors and self samples (in the detector generation phase) and coming data instances (in the detection phase). Obviously, the matching rule is dependent on detector representation. In this paper, we assume binary representation for all detectors and data. Binary representation is the most simple and popular representation for detectors and data in AIS, and Dni = f(d;i), (d;i) is a negative r-chunk detectorg is set of all negative r-chunk detectors at position i, i = 1;:::;‘ r +1. Dp = [‘ r+1Dpi is set of all positive r-chunk detectors. Dn = [‘ r+1Dni is set of all negative r-chunk detectors. Foragivendetectorset X,SX and NX arethe sets of self and non-self strings detected by X, respectively. Example 1. Let ‘ = 5, matching threshold r = other representations (such as real valued) could 3. Suppose that we have the set of six self be reduced to binary [17, 3]. For binary based AIS, the r-chunk and r-contiguous detectors are among the most common matching rules. An r-chunk matching rule can be seen as a generalisation of the r-contiguous matching rule, strings S = fs1 = 00000;s2 = 00010;s3 = 10110;s4 = 10111;s5 = 11000;s6 = 11010g. Dp1 = f(000,1); (101,1); (110,1)g (Dp1 is set of all leftmost substring of length ‘ of s, s 2 S), Dn1 = f(001,1); (010,1); (011,1); (100,1); (111,1)g, V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 25 Dp2 = f(000,2); (001,2); (011,2); (100,2); (101,2)g, Dn2 = f(010,2); (110,2); (111,2)g, Dp3 = f(000,3); (010,3); (110,3); (111,3)g, Dn3 = f(001,3); (011,3); (100,3); (101,3)g (note that Dpi [ Dni = 3, i = 1, 2, 3). The self space covered by the set of all positive 3-chunk detectorsisSDp =f00000;00001;00010;00011; 00110; 00111; 01000; 01001; 01010; 01011; 01110; 01111; 10000; 10001; 10010; 10011; 10100; 10101; 10110; 10111; 11000; 11001; 11010; 11011; 11110; 11111g. The non-self strings detected by the set of all negative 3-chunk detectorsisNDn =f00001;00011;00100;00101; 00110; 00111; 01000; 01001; 01010; 01011; 01100; 01101; 01110; 01111; 10000; 10001; 10010; 10011; 10100; 10101; 11001; 11011; 11100; 11101; 11110; 11111g. It could be seen from Example 1 that NDp = 5nSDp = f00100; 00101; 01100; 01101; 11100; 11101g , NDn, so the detection coverage of Dn is not the same as that of Dp. This is undesirableforthecombinationofPSAandNSA. Hence,tocombinepositiveandnegativeselection algorithms in an unified framework, we have to change the semantics of positive selection in the detection phase as follows. Definition 3 (Detection in positive selection). If new instance matches ‘ r + 1 positive r-chunk detectors (dij;i), i = 1;:::;‘ r +1, it is claimed as self, otherwise it is claimed as non-self. With the new detection semantics, the following proposition on the equivalence of detection coverage of r-chunk type PSA and NSA could be stated. Proposition 1 (Detection Coverage). The detection coverage of positive and negative selection algorithms coincide. otherwise it is claimed as self. Obviously, it is dual to the detection of new data instances in positive selection as given in Definition 3. This proposition lays the foundation for our novel Positive-Negative Selection Algorithm (PNSA) proposed in Section 4. 3. Related Works Both PSA and NSA achieve quite similar performance for detecting novelty in data patterns [19]. Dasgupta D. et al. [20] conducted one of the earliest experiments on combining positive with negative selection. The combined process is embedded in a genetic algorithm using a fitness function that assigns a weight to each bit based on the domain knowledge. Their method is neither aimed to reduce detector storage complexity nor detection time. Esponda et al. [21] proposed a generic NSA for anomaly detection problems. Their model of normal behavior is constructed from an observed sample of normally occurring patterns. Such a model could represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection/selection). However, their NSA is not concerned with the combination of positive and negative selection in detection phase as in PNSA. Stibor et al. [1] argued that positive selection might have better detection performance than negative selection. However, the choice between positive selection algorithms and negative ones obviously depends on representation of the AIS-based applications. An example in Section 4 shows that some positive trees are more compact than the others and vice versa. To the best of our knowledge, there has not beenanypublishedattemptincombiningr-chunk type PSA and NSA for the purpose of reducing NDp = NDn SDp = SDn (1) detector storage complexity and real/average detection time complexity. (2) Proof. From the description of NSAs (see 4. New Positive-Negative Selection Algorithm Fig. 1), if a new data instance matches a negative It can be seen from Section 2 that the positive r-chunk detector, then it is claimed as non-self, and negative selection are dual. This motivates 26 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 our approach to combining the two mechanisms. In this section, a new r-chunk type NSA is proposed that is the combination of positive and negative selection. 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 In our proposed approach, binary trees are (a) (b) used as data structure for storing the detector set to reduce memory complexity, and therefore the time complexity of the detection phase. To build and store the detection set, our algorithm first 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 constructs ‘ r + 1 binary trees (called positive trees) corresponding to ‘ r +1 positive r-chunk detector set Dpi, i = 1;:::;‘ r + 1. Then, all complete subtrees of these trees are removed to achieve a compact representation of the positive r-chunk detector set while maintaining the detection coverage. Finally, for every ith (c) (d) 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 (e) (f) positive trees, we decide whether or not it should be converted to the negative tree, which covers the negative r-chunk detector set Dni. The decision depends on which tree is more compact. When this process is done, we have ‘ r + 1 compact binary trees that some of them represent positive r-chunk detectors and the others represent negative ones. The r-chunk matching rule on binary trees is implemented as follows: a given sample s matches the positive (negative) tree ith if s[i;:::;i + k] is a path from the root to a leaf, i = 1;:::;‘ r + 1, k < r. The detection phase can be conducted by traveling the compact binary trees iteratively one by one: a sample s is claimed as non-self if it matches a negative tree or it does not match all positive trees, otherwise it is considered as self. Example 2. For the set of self strings S from Example 1, where ‘ = 5 and r = 3, the six binary trees (the left and right child are labeled 0 and 1 respectively) represent the detector set of six 3-chunk detectors (Dpi and Dni, i = 1;2;3) as depicted in Figure 3. In the Figure, dashed arrows in some positive trees mark the complete subtrees that will be removed to achieve compact tree representation. ThenumberofnodesofthetreesinFigures3.a - 3.f (after deleting complete subtrees) are 9, Fig. 3: Binary tree representation of the detector set generated from S defined in Example 1. The positive trees for Dp1, Dp2 and Dp3 are in (a), (c) and (e), respectively; The negative trees for Dn1, Dn2 and Dn3 are in (b), (d) and (f), respectively. 10, 7, 6, 8 and 8, respectively. Therefore, the chosen final trees are those in Figures 3.a (9 nodes), 3.d (6 nodes) and 3.e or 3.f (8 nodes). In real implementation, it is unnecessary to generate both positive and negative trees. Since each Dpi could dually be represented either by a positive or a negative tree, we only need to generate (compact) positive trees. If a compact positive tree T has more number of leaves than the number of internal nodes that have single child, the corresponding negative tree NT should have less number of nodes than T. Therefore, NT shouldbeusedinsteadofT torepresent Dni more compactly. It is noted that NT could be obtained from T by taking the dual links (paths) in T. The following example illustrates this observation. Example 3. Consider again the set of self strings S from Example 1. The compact positive tree for the positive 3-chunk detector set Dp2 = f(000,2); (001,2); (011,2); (100,2); (101,2)g is shown in Fig. 4.a. This tree has three leaves and two nodes that have only one child (in dotted circles) so it should be converted to the corresponding negative tree as illustrated in Fig. 4.b. V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 27 0 1 0 1 0 1 0 1 1 could potentially lead to better detection time complexity in real and average cases. To see this, first, let the following theorem be stated: 1 0 Theorem 1 (PNSA detector storage complexity). (a) (b) Fig. 4: Conversion of a positive tree to a negative one. Algorithm 38 summarizes the overall PNSA. In the algorithm, the process of generating compact binary (positive and negative) trees representing the complete r-chunk detector set is conductedintheouter“for”loop. First,allbinary positive tree Ti are constructed by the first inner loop. Then, the compactification of all Ti are conducted by the second one, i = 1;:::;‘ r+1. The conversion of a positive tree to negative one takes place in “if” statement after the second inner “for” loop. The procedure for recognizing a given cell string s as self or non-self, is carried out by the last “while ::: do” and “if ::: then ...else” statements. The detection phase of PNSA could be illustrated by the following example. Example 4. Given S, r as in Example 1, and s = 10100 as the inputs of the algorithm, three binary trees are constructed as the detector set in Figures 3.a, 3.d. and 3.e. The output of the algorithm is “s is non-self” because all the paths of tree T2 do not contain substring s[2,...,4] = 010 of s. From the description of PNSA, it is trivial Given a self set S and an integer ‘, the PNSA produces the detector (binary) tree set that have at most total (‘ r+1):2r 2 less number of nodes in comparison to the detector tree set created by a PSA or NSA only, where r 2 f2;:::;‘ r +1g. Proof. We only prove the theorem for the PSA case, the NSA case can be proven in a similar way. Because there are (‘ r + 1) positive trees can be build from the self set S, so the theorem is proved if it can reduce at most 2r 2 nodes from a positive tree. The theorem is proved by induction on r (also the height of binary trees). It is noted that when converting a positive tree to a negative tree as in PNSA, the reduction in number of nodes is exactly as the result of the subtraction of number of leaf nodes from the numberofinternalnodesthathaveonlyonechild. When r = 2, there are 16 trees of possible positive trees are of height 2. By examining all 16 cases, we have found that the maximum reduction in number of nodes is 1. One example of these cases is the positive tree that has 2 leaf nodes after compactification as in Fig. 5.a. Since it has two leaf nodes and one one-child internal node, after being converted to the corresponding negative tree, the number of nodes is reduced by 2 - 1 = 1. to show that it takes jSj(‘ r + 1):r steps to 0 1 1 generate all necessary trees (detector generation time complexity) and (‘ r + 1):r steps to verify 0 1 0 1 a cell string as self or non-self in the worst case (worse-case detection time complexity). These (a) (b) time complexities are similar to the state-of-the-art NSAs (PSAs) such as the one proposed in [4]. However, by using compact positive and negative binary trees for storing the detector set, PNSA could reduce the storage complexity of the detector set in comparison with the other r-chunk type single NSAs or PSAs that store detectors as binary strings. This storage complexity reduction Fig. 5: One node is reduced in a tree: a compact positive tree has 4 nodes (a) and its conversion (a negative tree) has 3 node (b). Suppose that the theorem’s conclusion is right for all r < k. We shall prove that it is also right for k. This is done by an observation that in all positive trees that are of height k, there is at least 28 V.T. Nguyen et al. / VNU Journal of Science: Comp. Science & Com. Eng. Vol. 31, No. 1 (2015) 22–31 Algorithm 1: PNSA (Positive-Negative Selection Algorithm) Data: a self set S, an integer r 2 f1;:::;‘ r +1g a cell string s to be detected. Result: detection of s as self or non-self. 1 for i = 1;:::;‘ r +1 do 2 initialize an empty binary self tree Ti 3 for each s 2 S do 4 insert s[i;:::;r i+1] into Ti 5 end 6 for every internal node n 2 Ti do

Tài liệu liên quan