An efficient tree based frequent temporal inter object pattern mining approach in time series databases

Đăng ngày 4/25/2019 3:40:54 AM | Thể loại: | Lần tải: 0 | Lần xem: 5 | Page: 20 | FileSize: 0.29 M | File type: PDF
An efficient tree based frequent temporal inter object pattern mining approach in time series databases. In order to make the most of time series present in many various application domains such as finance, medicine, geology, meteorology, etc., mining time series is performed for useful information and hidden knowledge. Discovered knowledge is very significant to help users such as data analysts and managers get fascinating insights into important temporal relationships of objects/phenomena along time.
VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21
An Efficient Tree-based Frequent Temporal Inter-object
Pattern Mining Approach in Time Series Databases
Nguyen Thanh Vu, Vo Thi Ngoc Chau*
Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Abstract
In order to make the most of time series present in many various application domains such as finance, medicine,
geology, meteorology, etc., mining time series is performed for useful information and hidden knowledge.
Discovered knowledge is very significant to help users such as data analysts and managers get fascinating
insights into important temporal relationships of objects/phenomena along time. Unfortunately, two main
challenges exist with frequent pattern mining in time series databases. The first challenge is the combinatorial
explosion of too many possible combinations for frequent patterns with their detailed descriptions, and the
second one is to determine frequent patterns truly meaningful and relevant to the users. In this paper, we propose
a tree-based frequent temporal inter-object pattern mining algorithm to cope with these two challenges in a level-
wise bottom-up approach. In comparison with the existing works, our proposed algorithm is more effective and
efficient for frequent temporal inter-object patterns which are more informative with explicit and exact temporal
information automatically discovered from a time series database. As shown in the experiments on real financial
time series, our work has reduced many invalid combinations for frequent patterns and also avoided many
irrelevant frequent patterns returned to the users.
© 2015 Published by VNU Journal of Science.
Manuscript communication: received 15 December 2013, revised 06 December 2014, accepted 19 January 2015
Corresponding author: Vo Thi Ngoc Chau, chauvtn@cse.hcmut.edu.vn
Keywords: Frequent Temporal Inter-Object Pattern, Temporal Pattern Tree, Temporal Pattern Mining, Support
Count, Time Series Mining, Time Series Rule Mining.
third
challenging
problem,
one
of
the
ten
challenging problems in data mining research
1. Introduction
pointed out in [30]. In addition, [10] has shown
An increasing popularity of time series
nowadays exists in many domains such as
finance, medicine, geology, meteorology, etc.
The resulting time series databases possess
knowledge that might be useful and valuable
for users to get more understanding about
behavioral activities and changes of the objects
and phenomena of interest. Thus, time series
mining is an important task. Indeed, it is the
this research area has been very active so far.
Among time series mining tasks, rule mining is
a meaningful but tough mining task shown in
[25]. This task is performed with a process
mainly including two main phases: mining
frequent temporal patterns and deriving
temporal rules representing temporal
associations between those patterns. In this
paper, our work focuses on the first phase for
frequent temporal patterns.
N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21
2
At present, we are aware of many existing
by
a
minimum
support
count
value.
These
works related to the frequent temporal pattern
semantics-based
temporal
patterns
are
mining task on time series. Some that can be
semantically
abstracted
from
one
or
many
listed are [3, 4, 5, 9, 14, 15, 16, 18, 19, 20, 26,
different time series, each of which corresponds
27, 29]. Firstly in an overall view about these
to a time-ordered sequence of some repeating
related works, it is realized that patterns are
behavioral
activities
of
some
objects
or
often
different
from
work
to
work
and
phenomena of interest whose characteristic has
discovered
from
many
various
time
series
been observed and recorded over the time in its
datasets. In a few works, the sizes and shapes of
respective time series. It is also necessary to
patterns are fixed, and time gaps in patterns are
distinguish
our
so-called
frequent
temporal
pre-specified by users. In contrast, our work
patterns
from
motifs
which
are
repeating
would like to discover patterns of interest that
continuous subsequences in an individual time
can be of any shapes with any sizes and with
series. In contrast, a frequent temporal pattern
any time gaps able to be automatically derived
being
considered
might
contain
various
from time series. Secondly, there is neither data
repeating meaningful continuous subsequences
benchmarking nor standardized definition of the
with
many
different
temporal
relationships
frequent temporal pattern mining problem on
automatically
discovered
from
one
or
many
time series. Indeed, whenever we get a mention
different time series in the time series database.
of
frequent
pattern
mining,
market
basket
As
for
the
second
extension,
we
have
analysis appears to be a marvelous example of
reconsidered
our
tree-based
algorithm
the traditional association rule mining problem.
employing appropriate data structures such as
Such an example is not available in the time
tree and hash table. The modified version of
series
mining
research
area
for
frequent
this algorithm is defined with a keen sense of
temporal patterns. Thirdly, two main challenges
reducing the number of invalid combinations
that need to be resolved for frequent pattern
generated and checked for frequent temporal
mining
in
time
series
databases
include
the
patterns. It is also capable of removing many
problem
of
combinatorial
explosion
of
too
irrelevant frequent patterns for the users.
many possible combinations for frequent
patterns with their detailed descriptions and the
problem of discovering frequent patterns truly
meaningful and relevant to the users.
As shown in the experiments on real
financial time series, our proposed algorithm is
more efficient to deal with the combinatorial
explosion problem. In comparison with the
Based on the aforementioned motivations,
existing works, our work is useful for frequent
we
propose
a
tree-based
frequent
temporal
temporal inter-object patterns more informative
inter-object pattern mining algorithm in a level-
with
explicit and
exact
temporal
information
wise
bottom-up
approach
as
an
extended
which is automatically discovered from a time
version of the tree-based algorithm in [20]. The
series database.
first extension is a generalized frequent
temporal pattern mining process on time series
databases with an adapted frequent temporal
pattern template. As a result, a frequent
temporal pattern in our work is semantics-based
temporal pattern that occurs as often as or more
often than expectation from users determined
The rest of our paper is structured as
follows. Section II provides an overall view of
the related works to point out the differences
between those works and ours. In section III,
we introduce a generalized frequent temporal
pattern mining process on time series databases
where our proposed algorithm is included. In
3
N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21
section IV, we propose an efficient tree-based
length of a period of time comprising the trend.
frequent
temporal
inter-object
pattern
mining
Differently,
our
work
concentrates
on
algorithm
and
its
evaluation
with
many
discovering
relationships
among
primitive
experiments is presented and discussed in
section V. Finally, section VI concludes our
work and states several future works.
patterns. It is worth noting that our proposed
algorithms are not constrained by the number of
pattern types as well as the meanings and
shapes of primitive patterns. Moreover, [3] has
recently focused on discovering recent temporal
2. Related Works
patterns from interval-based
temporal abstractions with
sequences of
two temporal
In this section, some related works [3-7, 9,
relationships:
before
and
co-occur.
Mining
14-22, 24, 26-29] are examined in comparison
with our work. Among these related works, [3-
5, 7, 9, 14-16, 18-20, 22, 26, 27] are proposed
for frequent temporal pattern mining in time
series, [21, 24, 29] for frequent sequential
pattern mining in sequential databases, and [6,
17, 28] for frequent temporal pattern mining in
temporal databases.
recent temporal patterns in [3] is one step in
learning a classification model for event
detection problems. Different from [3], our
work belongs to the time series rule mining
task. Indeed, we would like to discover more
complex frequent temporal patterns in many
different time series with more temporal
relationships. For more applications, such
patterns can be used in other time series mining
In
the
most
basic
form,
motifs
can
be
tasks
such
as
clustering,
classification,
and
considered as primitive patterns in time series
prediction in time series. Based on the temporal
mining. There exist many approaches to find
concepts of duration, coincidence, and partial
motifs in time series named a few as [9, 15, 16,
order
in
interval
time
series,
[18]
defined
19, 26, 27]. Our work is different from those
pattern types from multivariate time series as
because the scope of our algorithms does not
Tone, Chord, and Phrase. Tones representing
include the phase of finding primitive patterns
durations are labeled time intervals, which are
that might be concerned with a motif discovery
basic
primitives.
Chords
representing
algorithm.
We
suppose
that
those
primitive
coincidence
are
formed
by
simultaneously
patterns
are
available
to
our
proposed
occurring Tones. Phrases are formed by several
algorithm. As for more complex patterns, [4]
Chords connected with a partial order which is
has
introduced
a
notion
of
perception-based
actually the temporal relationship “before” in
pattern in time series mining with a so-called
Allen’s terms. Support is used as a measure to
methodology
of
computing
with
words
and
evaluate discovered patterns. As compared to
perceptions.
[4]
reviewed
in
details
such
[18],
our
work
supports
more
temporal
descriptions using sign of derivatives, scaling of
relationships with time information able to be
trends and shapes, linguistic interpretation of
automatically
discovered
along
with
frequent
patterns from clustering, a pattern generation
temporal
inter-object
patterns.
Not
directly
grammar, and temporal relationships between
proposed for frequent temporal patterns in time
patterns.
Also
towards
perception-based
time
series,
[22]
made
use
of
Allen’s
temporal
series mining, [14] presented a duration-based
relationships
(before,
equal,
meets,
overlaps,
linguistic
trend
summarization
of time
series
during, starts, finishes, etc.) in their so-called
using a few features such as the slope of the
temporal abstractions. A temporal abstraction is
line, the fairness of the approximation of the
simply a description of a (set of) time series
original data points by line segments and the
through
sequences
of
temporal
intervals
HƯỚNG DẪN DOWNLOAD TÀI LIỆU

Bước 1:Tại trang tài liệu slideshare.vn bạn muốn tải, click vào nút Download màu xanh lá cây ở phía trên.
Bước 2: Tại liên kết tải về, bạn chọn liên kết để tải File về máy tính. Tại đây sẽ có lựa chọn tải File được lưu trên slideshare.vn
Bước 3: Một thông báo xuất hiện ở phía cuối trình duyệt, hỏi bạn muốn lưu . - Nếu click vào Save, file sẽ được lưu về máy (Quá trình tải file nhanh hay chậm phụ thuộc vào đường truyền internet, dung lượng file bạn muốn tải)
Có nhiều phần mềm hỗ trợ việc download file về máy tính với tốc độ tải file nhanh như: Internet Download Manager (IDM), Free Download Manager, ... Tùy vào sở thích của từng người mà người dùng chọn lựa phần mềm hỗ trợ download cho máy tính của mình  
5 lần xem

An efficient tree based frequent temporal inter object pattern mining approach in time series databases. In order to make the most of time series present in many various application domains such as finance, medicine, geology, meteorology, etc., mining time series is performed for useful information and hidden knowledge. Discovered knowledge is very significant to help users such as data analysts and managers get fascinating insights into important temporal relationships of objects/phenomena along time..

Nội dung

VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 An Efficient Tree-based Frequent Temporal Inter-object Pattern Mining Approach in Time Series Databases Nguyen Thanh Vu, Vo Thi Ngoc Chau* Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam Abstract In order to make the most of time series present in many various application domains such as finance, medicine, geology, meteorology, etc., mining time series is performed for useful information and hidden knowledge. Discovered knowledge is very significant to help users such as data analysts and managers get fascinating insights into important temporal relationships of objects/phenomena along time. Unfortunately, two main challenges exist with frequent pattern mining in time series databases. The first challenge is the combinatorial explosion of too many possible combinations for frequent patterns with their detailed descriptions, and the second one is to determine frequent patterns truly meaningful and relevant to the users. In this paper, we propose a tree-based frequent temporal inter-object pattern mining algorithm to cope with these two challenges in a level-wise bottom-up approach. In comparison with the existing works, our proposed algorithm is more effective and efficient for frequent temporal inter-object patterns which are more informative with explicit and exact temporal information automatically discovered from a time series database. As shown in the experiments on real financial time series, our work has reduced many invalid combinations for frequent patterns and also avoided many irrelevant frequent patterns returned to the users. © 2015 Published by VNU Journal of Science. Manuscript communication: received 15 December 2013, revised 06 December 2014, accepted 19 January 2015 Corresponding author: Vo Thi Ngoc Chau, chauvtn@cse.hcmut.edu.vn Keywords: Frequent Temporal Inter-Object Pattern, Temporal Pattern Tree, Temporal Pattern Mining, Support Count, Time Series Mining, Time Series Rule Mining. third challenging problem, one of the ten 1. Introduction An increasing popularity of time series nowadays exists in many domains such as finance, medicine, geology, meteorology, etc. The resulting time series databases possess knowledge that might be useful and valuable for users to get more understanding about behavioral activities and changes of the objects and phenomena of interest. Thus, time series mining is an important task. Indeed, it is the challenging problems in data mining research pointed out in [30]. In addition, [10] has shown this research area has been very active so far. Among time series mining tasks, rule mining is a meaningful but tough mining task shown in [25]. This task is performed with a process mainly including two main phases: mining frequent temporal patterns and deriving temporal rules representing temporal associations between those patterns. In this paper, our work focuses on the first phase for frequent temporal patterns. N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 2 At present, we are aware of many existing by a minimum support count value. These works related to the frequent temporal pattern semantics-based temporal patterns are mining task on time series. Some that can be semantically abstracted from one or many listed are [3, 4, 5, 9, 14, 15, 16, 18, 19, 20, 26, 27, 29]. Firstly in an overall view about these different time series, each of which corresponds to a time-ordered sequence of some repeating related works, it is realized that patterns are behavioral activities of some objects or often different from work to work and phenomena of interest whose characteristic has discovered from many various time series been observed and recorded over the time in its datasets. In a few works, the sizes and shapes of respective time series. It is also necessary to patterns are fixed, and time gaps in patterns are distinguish our so-called frequent temporal pre-specified by users. In contrast, our work patterns from motifs which are repeating would like to discover patterns of interest that can be of any shapes with any sizes and with continuous subsequences in an individual time series. In contrast, a frequent temporal pattern any time gaps able to be automatically derived being considered might contain various from time series. Secondly, there is neither data repeating meaningful continuous subsequences benchmarking nor standardized definition of the with many different temporal relationships frequent temporal pattern mining problem on time series. Indeed, whenever we get a mention automatically discovered from one or many different time series in the time series database. of frequent pattern mining, market basket As for the second extension, we have analysis appears to be a marvelous example of reconsidered our tree-based algorithm the traditional association rule mining problem. Such an example is not available in the time employing appropriate data structures such as tree and hash table. The modified version of series mining research area for frequent this algorithm is defined with a keen sense of temporal patterns. Thirdly, two main challenges that need to be resolved for frequent pattern mining in time series databases include the reducing the number of invalid combinations generated and checked for frequent temporal patterns. It is also capable of removing many problem of combinatorial explosion of too irrelevant frequent patterns for the users. many possible combinations for frequent patterns with their detailed descriptions and the problem of discovering frequent patterns truly meaningful and relevant to the users. Based on the aforementioned motivations, As shown in the experiments on real financial time series, our proposed algorithm is more efficient to deal with the combinatorial explosion problem. In comparison with the existing works, our work is useful for frequent we propose a tree-based frequent temporal temporal inter-object patterns more informative inter-object pattern mining algorithm in a level- with explicit and exact temporal information wise bottom-up approach as an extended which is automatically discovered from a time version of the tree-based algorithm in [20]. The first extension is a generalized frequent temporal pattern mining process on time series databases with an adapted frequent temporal pattern template. As a result, a frequent temporal pattern in our work is semantics-based temporal pattern that occurs as often as or more often than expectation from users determined series database. The rest of our paper is structured as follows. Section II provides an overall view of the related works to point out the differences between those works and ours. In section III, we introduce a generalized frequent temporal pattern mining process on time series databases where our proposed algorithm is included. In 3 N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 section IV, we propose an efficient tree-based frequent temporal inter-object pattern mining algorithm and its evaluation with many experiments is presented and discussed in section V. Finally, section VI concludes our work and states several future works. 2. Related Works In this section, some related works [3-7, 9, 14-22, 24, 26-29] are examined in comparison with our work. Among these related works, [3-5, 7, 9, 14-16, 18-20, 22, 26, 27] are proposed for frequent temporal pattern mining in time series, [21, 24, 29] for frequent sequential pattern mining in sequential databases, and [6, 17, 28] for frequent temporal pattern mining in temporal databases. In the most basic form, motifs can be considered as primitive patterns in time series mining. There exist many approaches to find motifs in time series named a few as [9, 15, 16, 19, 26, 27]. Our work is different from those because the scope of our algorithms does not include the phase of finding primitive patterns that might be concerned with a motif discovery algorithm. We suppose that those primitive patterns are available to our proposed algorithm. As for more complex patterns, [4] has introduced a notion of perception-based pattern in time series mining with a so-called methodology of computing with words and perceptions. [4] reviewed in details such descriptions using sign of derivatives, scaling of trends and shapes, linguistic interpretation of patterns from clustering, a pattern generation grammar, and temporal relationships between patterns. Also towards perception-based time series mining, [14] presented a duration-based linguistic trend summarization of time series using a few features such as the slope of the line, the fairness of the approximation of the original data points by line segments and the length of a period of time comprising the trend. Differently, our work concentrates on discovering relationships among primitive patterns. It is worth noting that our proposed algorithms are not constrained by the number of pattern types as well as the meanings and shapes of primitive patterns. Moreover, [3] has recently focused on discovering recent temporal patterns from interval-based sequences of temporal abstractions with two temporal relationships: before and co-occur. Mining recent temporal patterns in [3] is one step in learning a classification model for event detection problems. Different from [3], our work belongs to the time series rule mining task. Indeed, we would like to discover more complex frequent temporal patterns in many different time series with more temporal relationships. For more applications, such patterns can be used in other time series mining tasks such as clustering, classification, and prediction in time series. Based on the temporal concepts of duration, coincidence, and partial order in interval time series, [18] defined pattern types from multivariate time series as Tone, Chord, and Phrase. Tones representing durations are labeled time intervals, which are basic primitives. Chords representing coincidence are formed by simultaneously occurring Tones. Phrases are formed by several Chords connected with a partial order which is actually the temporal relationship “before” in Allen’s terms. Support is used as a measure to evaluate discovered patterns. As compared to [18], our work supports more temporal relationships with time information able to be automatically discovered along with frequent temporal inter-object patterns. Not directly proposed for frequent temporal patterns in time series, [22] made use of Allen’s temporal relationships (before, equal, meets, overlaps, during, starts, finishes, etc.) in their so-called temporal abstractions. A temporal abstraction is simply a description of a (set of) time series through sequences of temporal intervals N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 4 corresponding to relevant patterns (i.e. In sequential database mining, [21, 24, 29] behaviors or properties) detected in their time courses. These temporal abstractions can be combined together to form more complex temporal abstractions also using Allen’s temporal relationships BEFORE, MEETS, OVERLAPS, FINISHED BY, EQUALS, and STARTS. It is realized that temporal abstractions discovered from [22] are temporal patterns rather similar to our frequent temporal inter-object patterns. However, our work supports richer trend-based patterns and also provides a new efficient pattern mining algorithm as compared to [22]. For another form of patterns, [7] aimed to capture the similarities among stock market time series such that their sequence-subsequence relationships are preserved. In particular, [7] identified patterns representing collections of contiguous subsequences which shared the same shape for specific time intervals. Their patterns show pairwise similarities among sequences, called timing patterns using temporal relationships such as begin earlier, end later, and are longer. [7] also defined Support Count and Confidence measures for a relationship but these measures were not employed in any algorithms of their work. As compared to [7], our work supports more temporal relationships with explicit time. More recently, [5] has paid attention to linguistic association rules in time series which are based on fuzzy itemsets stemming from continuous subsequences in time series. Each frequent itemset in [5] can be considered as a frequent pattern discovered in time series. However, there is no consideration for temporal knowledge in their frequent fuzzy itemsets. As are among many existing works on frequent sequential pattern mining. [24] introduced GSP algorithm to discover generalized sequential patterns in a sequential database using Apriori antimonotonic constraint. Later, [21] proposed PrefixSpan algorithm to avoid the weakness of [24] in scanning the database many times unnecessarily. Indeed, [21] can find frequent sequential patterns without generating any candidate for them. For a comparison, those frequent sequential patterns are not as rich as ours in temporal aspects hidden in time series which include interval-based relationships and their associated time. As for [29], so-called inter-sequence patterns are discovered with two proposed algorithms which are M-Apriori and EISP-Miner. The first algorithm is Apriori-like and not as efficient as the second one which is based on a tree data structure, named ISP-tree. Nevertheless, the capability of both algorithms is limited to a user-specified parameter which is maximum span, called maxspan. It is believed that it is not easy for users to provide a suitable value for this parameter as soon as their sequential database is mined. This might lead to many trial-and-error experiments for maxspan. In temporal database mining, [6, 17, 28] worked for inter-transaction/inter-object patterns/rules which involved one or many different transactions/objects. Similarly, our discovered frequent temporal patterns are inter-object patterns. Differently, our patterns are mined in the context of time series mining where each component in our patterns is trend- based with more degrees in change than for [20], our work is based on their proposed “up/down” or “increasing/decreasing” and work with several extensions to the process and tree-based algorithm in order to discover frequent temporal inter-object patterns in a time series database more efficiently. temporal relationships automatically derived are interval-based with more time information than point-based relationships “co- occur/before/after”. 5 N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 To the best of our knowledge, the type of Depicted in Figure 1, the detail about the frequent temporal inter-object patterns defined pattern mining process will be mentioned in our work has not yet been taken into clearly as follows. Our process includes three consideration in the existing works. The phases mainly based on the well-accepted proposed temporal frequent inter-object pattern general knowledge discovery process [12]. mining algorithm on a set of various time series Phase 1 is responsible for preprocessing to is designed to be a more efficient version of the tree-based algorithm in [20]. 3. A Generalized Frequent Temporal Inter-object Pattern Mining Process In this section, a generalized frequent temporal inter-object pattern mining process on a time series database is figured out to elaborate our solution to discovering so-called frequent temporal inter-object patterns from a given set of different time series. This process is mainly based on the one in [20]. Each time series is considered an object of interest which can be some phenomena or some physical objects in our real life. We refer to a notion of temporal inter-object pattern as temporal relationship among objects being considered. This notion of “inter-object” is somewhat similar to “inter-transaction” in [17, 28] and “inter-sequence” in [29]. However, our work aims to capture more temporal aspects of their relationships so that discovered patterns can be more informative and applicable to decision making support. In addition, interestingness of discovered patterns is measured by means of the degree to which they are frequent in the lifespan of these objects in regard to a user-specified minimum threshold prepare for semantics-based time series, phase 2 for the first step to obtain a set of repeating trend-based subsequences, and phase 3 for the primary step to fully discover frequent temporal inter-object patterns. As compared to the process in [20], our generalized process is not specific for the input of the proposed algorithm by relaxing the use of trend-based time series. Instead, so-called semantics-based symbolic time series are used so that users can have more freedom to express the meaning of each component in a resulting frequent pattern via the semantic symbols used for time series transformation in phase 1. 3.1. Phase 1 for Semantics-based Symbolic Time Series The input of this phase is also the one of our work, which consists of a set of raw time series of the same length for simplicity. Formally, each time series TS is defined as TS = (v1, v2, …, vn). TS is a so-called univariate time series in an n-dimension space. The length of TS is n. v1, v2, …, and vn are time-ordered real numbers. Indices 1, 2, …, n correspond to points in time in our real world on a regular basis. Regarding semantics, time series is understood as the recording of a quantitative characteristic of an object or phenomenon of called min_sup. This is because we use Support interest observed regularly over the time. Count as an objective measure with the meaning intact in [12]. G N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 6 Figure 1. A generalized frequent temporal inter-object pattern mining process on a time series database. As previously mentioned, we have The input of phase 2 is exactly the output of generalized the pattern mining process phase 1 which consists of one or many introduced in [20] for more semantics in semantics-based symbolic time series. The resulting frequent patterns. Thus, in this paper, we do not restrict the meaning of individual components in discovered frequent patterns to main objective of phase 2 is to find repeating subsequences in the input symbolic time series. Such subsequences are indeed motifs hidden in behavioral changes of objects and the degree to these time series. Regarding semantics, motifs which they change. Instead, we enable so-called semantics-based symbolic time series by means of any transformation technique on time series. themselves are frequent parts in time series. As compared to discrete point-based events in [17, 28], motifs in our work are suitable for the For instance, each time series can be applications where the time spans of an event transformed into a trend-based time series using short-term and long-term moving averages in are significant to user’s problems. For example, it is more informative for us to know that a [31] or into a symbolic time series using SAX stock keeps strongly increasing three technique in [16]. consecutive days denoted by BBB from The output of this phase is a set of semantics-based time series each of which is formally defined as (s1, s2, …, sn) where si∈Σ for i = 1..n where Σ is a discrete set of semantic symbols derived by a corresponding transformation technique. For the technique in [31], Σ = {A, B, C, D, E, F} where A represents the time series in a weak increasing trend; B in a strong increasing trend; C starting a strong increasing trend; D starting a weak increasing trend; E in a strong decreasing trend; and F in a weak decreasing trend. For the technique in [16], Σ is the word book. If two breakpoints are used, Σ = {a, b, c} where a represents subsequences with high values, b with average values, and c with low values. 3.2. Phase 2 for Repeating Subsequences Monday to Wednesday in comparison with a simple fact such that a stock increases. As of this moment, there are different approaches to the motif discovery task on time series as proposed in [9, 15, 16, 19, 26, 27]. This task is out of the scope of our work. In our work, we implemented a simple brute force algorithm to extract repeating subsequences which are motifs along with their counts, each of which is the number of occurrences of the subsequence in its corresponding symbolic time series. Because of our interest in frequent patterns, we consider repeating subsequences with at least two occurrences. In short, the output of this phase is a set of repeating subsequences with at least two occurrences that might stem from different objects. 7 N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 3.3. Phase 3 for Frequent Temporal Inter-object Patterns Similar to phase 2, phase 3 has the input which is the output of the previous phase, a set of repeating subsequences. In addition, phase 3 also needs a minimum support count threshold, min_sup, from users to evaluate the output returned to users. As compared to [29], min_sup is a single parameter whose value is provided by the users along with the input set of time series in our process. Using min_sup and the input, phase 3 first obtains a set of primitive patterns, named L1, which includes only repeating subsequences with the counts equal or greater than min_sup. All elements in L1 are called frequent temporal inter-object patterns at level 1. At this level, there is just one object involved in each frequent pattern. Differently, level is used to refer to the number of components in a pattern which will be detailed below, not to the number of objects involved in a pattern. Secondly, phase 3 proceeds with a frequent temporal inter-object pattern mining algorithm to discover and return to users a full set of frequent temporal inter-object patterns in a set of various time series. The rest of this subsection will define a notion of frequent temporal inter-object pattern and in section 4, we will propose an extended version of the tree-based frequent temporal inter-object pattern mining algorithm that makes the frequent temporal inter-object pattern mining process more effective and efficient. In general, we formally define a frequent temporal inter-object pattern at level k for k>1 in the following form: m1-m1.ID m2-m2.ID….mk-1-mk-1.ID< operator typek-1 : delta timek-1> mk-mk.ID. In this form, m1, m2, …, mk-1, and mk are primitive patterns in L1 which might come from different objects whose identifiers are m1.ID, m2.ID, …, mk-1.ID, and mk.ID, respectively. Regarding relationships between the components of a pattern at level k, operator type1, …, operator typek-1 are Allen’s temporal operators. There are thirteen Allen’s temporal operators in [1] well-known to express interval-based relationships along the time, including precedes (p), meets (m), overlaps (o), Finished by (F), contains (D), starts (s), equals (e), Started (S), during (d), finishes (f), overlapped by (O), met by (M), preceded by (P). For their converse relationships, our work used seven Allen’s temporal operators (p, m, o, F, D, s, e) to capture temporal associations between subsequences from different objects in phase 3. That is, operator type1, …, operator typek-1 are in {p, m, o, F, D, s, e}. Moreover, we use delta time1, …, delta timek-1 to keep time information of the corresponding relationships. Regarding semantics, intuitively speaking, a frequent temporal inter-object pattern at level k for k>1 fully presents the relationships between the frequent parts of different objects of interest over the time. Hence, we believe that unlike some other related works [7, 11, 17, 18], our patterns are in a richer and more understandable form and in addition, our pattern mining algorithm is enabled to automatically discover all such frequent temporal inter-object patterns with no limitation on their relationship types and time information. Example 1: Let us consider a frequent temporal pattern on a single object NY using the transformation technique in [31]: AA-NYBBB-NY {0, 10, 20}. This pattern enables us to know that after in a two-day weak increasing trend, NY has a three-day strong increasing trend and this fact repeats three times at positions 0, 10, and 20 in the lifetime of NY. Its illustration is given in Figure 2. N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 8 Figure 2. Illustration of a frequent temporal pattern on a single object NY. Figure 3. Illustration of a frequent temporal inter-object pattern on two objects: NY and SH. Example 2: Let us consider a frequent temporal inter-object pattern on two objects NY and SH also using the transformation technique in [31]: AA-NYAA-SH {0, 10}. This pattern, whose illustration is presented in Figure 3, involves two objects NY and SH and presents their temporal relationship along the time. In particular, we can state about NY and SH that NY has a two-day weak increasing trend and in the same duration of time, SH does too. This fact occurs twice at positions 0 and 10 in their lifetime. It is also worth noting that we absolutely do not know whether or not NY influences SH or vice versa in real life unless their relationships are analyzed in some depth. algorithms were defined: brute-force and tree-based. The brute-force algorithm provides a baseline for correctness checking and the tree-based one helps speeding up the pattern mining process in the spirit of FP-Growth algorithm [13]. The two algorithms followed the level-wise bottom-up approach. Based on [20], we extend the tree-based algorithm to a new version that enables us to deal with the combinatorial explosion problem by using an additional hash table for a detection and elimination of irrelevant frequent patterns. In particular, the modified tree-based algorithm is capable of removing the instances of potential candidates pertaining to one single pattern with overlapping parts. In the following subsections, the tree-based algorithmis detailed. 4.1.A Temporal Pattern Tree In this paper, we remain a so-called temporal pattern tree in [20]. Nevertheless, for being self-contained, the description of a temporal pattern tree is presented as follows. Nonetheless, such patterns provide us with objective data-driven evidence on the relationships among objects of interest so that Figure 4. The structure of a node in the temporal pattern tree. we can make other further thorough A temporal pattern tree (TP-tree) is a tree investigations into these objects and their that has n nodes of the same structure as shown surrounding environment. 4. The Proposed Tree-based Frequent Temporal Inter-object Pattern Mining Algorithm on Time Series Databases As noted in [20], the type of knowledge we aim to discover from time series has not yet been considered. Hence, in [20], two mining in Figure 4. A node structure of a node being considered in TP-tree is composed of the following fields: - ParentNode: a pointer that points to a parent node of the current node. - OperatorType: an Allen’s temporal operator in the form of

, , , , , , or to let us know about the temporal relationship between the current node 9 N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 and its parent node where p stands for precedes, m for meets, e for equal, s for starts, F for Using the node structure defined above, a temporal pattern tree is built in a level-wise finished by, D for contains, and o for overlaps. approach from level 0 up to level k - DeltaTime: an exact time interval associated with the temporal relationship in OperatorType field. - Pat.Length: a length of the corresponding pattern counting up to the current node. - Info: information about the corresponding pattern that the current node represents. - ID: an object identifier of the object which the current node stems from. - k: a level of the current node. - List of Instances: a list of all instances corresponding to all positions of the pattern that the current node represents. - List of ChildNodes: a hash table that contains pointers pointing to all children nodes of the current node at level (k+1). Key information of an element in the hash table is: [OperatorType associated with a child node + DeltaTime + Info of a child node + ID of a child node]. Each node corresponds to a component of some frequent temporal inter-object pattern. In particular, the root of TP-tree is at level 0, all primitive patterns at level 1 are handled by all nodes at level 1 of TP-tree, the second components of all frequent patterns at level 2 are associated with all nodes at level 2 of TP- tree, and so on. All nodes at level k are created corresponding to the way we discover frequent patterns at level (k-1) first and then use them to discover frequent patterns at level k. It is realized that a pattern at level k is only generated from all nodes at level (k-1) which belong to the same parent node. This feature helps us much avoid traversing the entire tree built so far to discover and create frequent patterns at higher levels and expand the rest of the tree. A subprocess of building TP-tree is shown step by step. Step 1 - Initialize TP-tree: Create the root of TP-tree labeled 0 at level 0. Step 2 - Handle L1: From the input L1 which contains m motifs from different trend-based time series with a support count satisfying the minimum support count min_sup, create m nodes and insert them into TP-tree at level 1. Distances between these nodes to the root are 0 and Allen’s OperatorType of each of these nodes is empty. The resulting TP-tree after steps 1 and 2 is displayed in Figure 5 when L1 has 3 frequent patterns corresponding to nodes 1, 2, and 3. Step 3 - Handle L2 from L1: Generate all possible combinations between the nodes at level 1 as all nodes at level 1 belong to the same parent node which is the root. This step is and added into TP-tree from all possible valid performed with seven Allen’s temporal combinations of all nodes at level (k-1). This mechanism comes from the idea such that candidates for frequent patterns at level k are generated just from frequent patterns at level (k-1). In addition, only nodes associated with support counts satisfying the minimum support count are inserted into TP-tree. 4.2.Building a Temporal Pattern Tree operators as follows. Let m and n be two instances in L1. With no loss of generality, these two instances are considered for a valid combination if m.StartPosition ≤ n.StartPosition where m.StartPosition and n.StartPosition are starting points in time of m and n, respectively. A combination process to generate a candidate in C2 is conducted below. Should any combination N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 10 has a satisfied support count, it is a frequent pattern at level 2 and added into L2. It is noted that a combination in the tree-based algorithm is associated with nodes in TP-tree that help us to early detect if a pattern is frequent. Thus, if a combination corresponding to an instance of a node that is currently available in TP-tree, we simply update the Figure 5. The resulting TP-tree after steps 1 and 2. position of the instance in List of Instances field of that node and further ascertain that the combination is associated with a frequent pattern. If a combination corresponds to a new node not in TP-tree, using a hash table, we easily have the support count of its associated pattern to check if it satisfies min_sup. If yes, Figure 6. The resulting TP-tree after step 3. the new node is inserted into TP-tree by If m and n belong to the same object, m must precede n. A combination is in the form of: m-m.IDn-n.ID where p stands for Allen’s operator precedes, delta (delta > 0) for an interval of time between m and n, m.ID and connecting to its parent node. The resulting TP-tree after step 3 is given in Figure 6 where nodes {4, 5, 6, 7, 8} are nodes inserted into TP-tree at level 2 to represent 5 frequent patterns at level 2. n.ID are object identifiers corresponding to their time series. In this case, m.ID = n.ID. Example 3: Using the transformation technique in [31], consider m-m.ID = EEB-ACB starting at 0 and n-n.ID = ABB-ACB starting at 7. A valid combination of m and n is EEB-ACBABB-ACB starting at 0. If m and n come from two different objects, ie. m-m.ID ≠ n-n.ID, a combination might be generated for the additional six Allen’s operators: meets (m), overlaps (o), Finished by (F), contains (D), starts (s), and equal (e). Valid combinations of m and n for these operators are Figure 7. The resulting TP-tree after step 4. Step 4 - Handle L3 from L2: Using information available in TP-tree, we do not need to generate all possible combinations between patterns at level 2 as candidates for patterns at level 3. Instead, we simply traverse TP-tree to generate combinations from branches sharing the same prefix path one level right before the level we are considering. Thus, we formed below where d is a common time can reduce greatly the number of combinations. interval in m and n. For instance, consider all patterns at L2 in - Meets: m-m.IDn-n.ID - Overlaps: m-m.IDn-n.ID - Finished by: m-m.IDn-n.ID - Contains: m-m.IDn-n.ID - Starts: m-m.IDn-n.ID - Equal: m-m.IDn-n.ID Figure 6. In a brute-force approach, we need to check and generate combinations from all patterns corresponding to paths {0, 1, 4}, {0, 1, 5}, {0, 1, 6}, {0, 3, 7}, and {0, 3, 8}. In contrast, the tree-based algorithm only needs to check and generate combinations from the patterns corresponding to paths sharing the same prefix which are {{0, 1, 4}, {0, 1, 5}, {0, 11 N.T. Vu et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 31, No. 1 (2015) 1-21 1, 6}} and {{0, 3, 7}, {0, 3, 8}}. It is ensured that no combination is generated from patterns corresponding to paths not sharing the same prefix, for example: {0, 1, 4} and {0, 3, 7}, {0, 1, 4} and {0, 3, 8}, etc. Besides, the tree-based Lk. The routine keeps repeating till no more level is created for TP-tree. As compared to FP-tree [13], TP-tree in our work has no header table. Instead, we use a hash table at each level to keep track of the support count of each