VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 59-65
Ant Colony Optimization based
Founder Sequence Reconstruction
Anh Vu Thi Ngoc1, Dinh Phuc Thai2,
Hoang Duc Nguyen2, Thanh Hai Dang2,, Dong Do Duc2
1The Hanoi college of Industrial Economics
2Faculty of Information Technology, VNU University of Engineering and Technology
Abstract
Reconstruction of a set of genetic sequences (founders) that can combine together to form given genetic
sequences (e.g. DNA) of individuals of a population is an important problem in evolutionary biology. Such
reconstruction can be modeled as a combinatorial optimization problem, in which we have to find a set of
founders
upon
that
genetic
sequences
of
the
population
can
be
generated
using
a
smallest
number
of
recombinations. In this paper we propose an ant colony optimization algorithm (ACO) based method, equipped
with some important improvements, for the founder DNA sequence reconstruction problem. The proposed
method yields excellent performance when validating on 108 test sets from three benchmark datasets. Comparing
with the best by far corresponding method, our proposed method performs better in 45 test sets, equally well in
44 and worse only in 19 sets. These experimental results demonstrate the efficacy and perspective of our
proposed method.
Received 11 Sep 2017; Revised 31 Dec 2017; Accepted 31 Dec 2017
Keywords: Founder sequence reconstruction (FSR), Ancestor genes, Ant colony optimization (ACO).
*1. Introduction
To
this
end,
the
main
challenge
is
at
the
Today we have been observing a huge
amount of biological sequences
(e.g. DNA/genes, proteins) steadily being
generated thanks to the unprecedentedly fast
development of bio-technologies. Having
genetic sequences of a population, researchers
are often interested in the evolution history of
the population, which can be traced back by
re-constructing such given sequences from a
small number of not-yet identified ancestors
(namely founder sequences) using some genetic
operators. Many biological studies have
demonstrated the efficacy of this approach [1].
problem of determining the plausible number of
founder (ancestor) sequences and of finding
themselves for a given finite offspring
sequences. It is well known as the founder
sequence reconstruction problem.
Various methods have been recently
proposed for reconstructing founder sequences,
such as those based on dynamic programming
[2], tree search [3], neighboring search [4] and
metaheuristics [5]. In this paper we propose a
ant colony optimization (ACO) based method
for the founder sequence reconstruction
problem. The manuscript is structured
as follows:
________
* Corresponding author. E-mail.: hai.dang@vnu.edu.vn
https://doi.org/10.25073/2588-1086/vnucsce.170
• Section 2 first formulates the problem of
founder sequence reconstruction and Section 3
then presents related works that have been
59
1 2
i
r r r
i
r
n
60
A.V.T. Ngoc et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 59-65
successfully applied to the problem with good
results reported.
• Our proposed algorithm, experimental
results and comparisons with previously
proposed state-of-the-art related methods are
described in Section 4.
length m defined over a finite set S , i.e.,
Ci = Ci1,Ci2, with Cij S (which can be A,
C, G, T if recombinants of interest are DNA
sequences), we need to find a set of k founder
sequences F = (F ,F , , each of length m
• Section 5 gives some conclusions for the
proposed method. It also suggests some potential
follow-ups to improve the method further.
defined over the set S . A set F is considered
valid if the set of recombinants C can be
reconstructed from F . This means that, each
2. Problem statement
Founder Sequences Reconstruction Problem
(FSRP) is defined as follows:
Given a set of n recombinants
C = (C1,C2,,Cn ), each Ci is a sequence of
recombinant Ci can be decomposed into pi
components (1 p m) F ,F ,,F so
i1 i2 ip
that each piece F ( j =1,2,, p ) appears at
ij
least once at the same position as in Ci .
K
L
Figure 1. Haloptye sequences as recombinants, which are supposed to be originated from a set of 3
predefined founder sequences using a decomposition with 8 breakpoints.
A
valid
decomposition
is
considered
which
are
part
of
the
founder
sequences,
is
reducible
if
two
consecutive
pieces
do
not
shown on the right-hand side. Breakpoints are
appear in the same founder sequence. Among
marked with vertical bars.
such reducible ones the FSRP aims to find out
The FSRP was first introduced by Ukkonen
the optimal decompositions with a minimum
[2]
and
has
been
proven
NP-Hard
[6]
with
number of required breakpoints. The number of
breakpoints for a solution F can be calculated
using the formula: i=1pi m.
k > 2.
3. Related work
In this paper we consider a common
biological application in that each recombinant
is a haplotype sequence, i.e. S ={0,1}, where
This section introduces two state-of-the-art
algorithms proposed for the FSR problem,
namely Recblock [3] and LNS [4], which have
0 and 1 are the two possible common alleles.
On the left side of Figure 1 is an example of
achieved
datasets.
excellent
results
on
benchmark
a set C of 6 haplotype sequences, which is
presented in form of a matrix. In the middle part
is a valid founder sequences (a, b and c)
assuming that the number of founder sequences
is set to 3. The optimal decomposition with 8
breakpoints on the recombinants into sections,
3.1. RecBlock algorithm
RecBlock [3] is a FSR algorithm based on
tree search. Given k founder sequences each of
length m , the algorithm encodes them as a
matrix with k rows and m columns. RecBlock
V
V
V
V V
l l
l
V
l
A.V.T. Ngoc et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 59-65
61
reviews the columns of the matrix from left to
right. Vertex l at the depth l of the search tree
is part of a solution for the prefix part of the
founders till the column l. Each vertex l is
labeled with a number of breakpoints BP( l ) in
solution found in the current episode is used to
learn (tune ) and go for the next turn.
Our proposed method for FSR has input and
output as follows:
Input: binary matrix C of size n*m
representing a recombinant set and k is the
the process of reconstructing recombinants byfar.
number of the founder sequences to be found.
Recblock uses some strategies to speed up
the reconstruction:
• Only consider the founder sequences in
the alphabet order to avoid revisiting
permutations.
• A vertex is not extended further if its
breakpoint number greater than that of the best
solution so far.
Output: binary matrix F of size k*m
string representing the founder sequences so
that BP(C,F) is minimal. Here, BP(C,F) is
the number of breakpoints required to obtain C
from F .
In general, our ACO based method for FSR
works as depicted in Algorithm 1:
Given two vertices and at the depth
1 2
of
l1
and
l2 ,
respectively,
if
BP(V1 )BP(Vl2 ) n
(where
n
is
the
number of recombinants), we may ignore
1
for downstream analysis.
3.2. Large neighborhood search algorithm
LNS-1c is empirically considered the best
algorithm proposed by far for solving the FSR
problem [4]. This algorithm uses the nearest-
neighbor search strategy over a large
neighborhood of constructed solutions.
During searching the neighborhood, the
algorithm picks out a set Ffree F beforehand,
then uses the algorithm Recblock to search for
alternative founder sequences in FFfree .
Whenever a better solution is found out, LNS-
1c performs local search over neighborhood
from scratch.
4.2. Structure graph for the FSR problem
For the sake of visualization, we simulate
the FSR problem as the problem of finding
paths on a corresponding structure graph (see
Figure 2).
This structure graph includes a start, an end
node and m columns. Each column has 2k
vertices, of which each corresponds to a state of
the corresponding column in the matrix F of
founder sequences. In particularly, each state is
4 Proposed method
4.1. Ant colony optimization based FSR
a binary string of length k .
Each vertex has edges connecting to all
ones in the next column. We can see all paths
Ant colony optimization [7] (ACO) is a
metaheuristic method simulating how ants in
nature find paths from their nest to food
sources, which turn out to be a reinforcement
learning method. ACO solves optimization
starting from the start to the end node has to go
through every column once, at which one state
is chosen. Each journey of ants travelling from
the start to the end node therefore corresponds
to a complete matrix of founder sequences.
problems throughout many episodes, in each of
which every ant travels to find solutions based
on heuristic information and pheromone matrix
containing information learned. The best
4.3. How ants travel on the structure graph
When travelling on the structure graph, ants
chose a next vertex to visit at random. The

Ant Colony Optimization based Founder Sequence Reconstruction

Đăng ngày | Thể loại: | Lần tải: 0 | Lần xem: 0 | Page: 7 | FileSize: M | File type: PDF
0 lần xem

Ant Colony Optimization based Founder Sequence Reconstruction. In this paper we propose an ant colony optimization algorithm (ACO) based method, equipped with some important improvements, for the founder DNA sequence reconstruction problem. The proposed method yields excellent performance when validating on 108 test sets from three benchmark datasets. Comparing with the best by far corresponding method, our proposed method performs better in 45 test sets, equally well in 44 and worse only in 19 sets. These experimental results demonstrate the efficacy and perspective of our proposed method.. Giống những giáo án bài giảng khác được bạn đọc giới thiệu hoặc do sưu tầm lại và chia sẽ lại cho các bạn với mục đích học tập , chúng tôi không thu phí từ người dùng ,nếu phát hiện nội dung phi phạm bản quyền hoặc vi phạm pháp luật xin thông báo cho website ,Ngoài tài liệu này, bạn có thể tải tiểu luận miễn phí phục vụ học tập Một số tài liệu download mất font không hiển thị đúng, có thể máy tính bạn không hỗ trợ font củ, bạn download các font .vntime củ về cài sẽ xem được.

Nội dung


1134943

Tài liệu liên quan