Scalable indexing and exploration of big time series data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Πανεπιστήμιο Πελοποννήσου
Abstract
Time series are generated and stored at a vastly increasing rate in many industrial
and research applications, including the Web and the Internet of Things, public
utilities, finance, astronomy, biology, and many more. A significant portion concerns
geolocated time series, i.e., those generated at, or otherwise associated with specific locations.
Although several works have focused on efficient time series similarity search,
there has been limited attention to the inherent challenge that geolocated time series
introduce for hybrid queries, i.e., queries that involve both spatial proximity and time
series similarity. Apart from traditional similarity search, we also consider the problem
of detecting locally similar pairs and groups, called bundles, over co-evolving
time series. These are pairs or groups of subsequences whose values do not differ
by more than a predefined threshold for a number of consecutive timestamps. They
could represent potentially valuable, concurrent common local patterns and trends
among the time series. Time series visualization and visual analytics in general, is another
field that has drawn the attention of the scientific community. However, there
is a lack of efficient techniques for visual exploration and analysis of geolocated time
series. Finally, large-scale time series forecasting has attracted a significant amount
of interest, due to the highly complex nature of such data.
In this thesis, we efficiently process hybrid queries through a hybrid index that
we propose, called BTSR-tree. Furthermore, we address the problem of hybrid similarity
joins over such geolocated time series. We introduce both centralized and
MapReduce-based algorithms for performing such join operations using spatial-only,
time series-only, and hybrid indices. Then, we tackle the problem of pair and bundle
discovery over co-evolving time series, via a filter-verification technique that only
examines candidate matches at judiciously chosen checkpoints across time. In the
same line of work, we consider hybrid queries for retrieving geolocated time series
based on filters that combine spatial distance and time series local similarity. To
efficiently support such queries, we introduce the SBTSR-tree index, an extension
of BTSR-tree that further optimizes local similarity search. Additionally, we present
two approaches that rely on hybrid indices, allowing efficient map-based visual exploration
and summarization of geolocated time series data. In particular, we use
the BTSR-tree index and we introduce a new variant of the standard iSAX index,
called geo-iSAX. We define the structure of the new index and show how both hybrid
indices can be directly exploited to produce map-based visualizations of geolocated
time series at different levels of granularity. Finally, towards large-scale time series
forecasting, we introduce FML-kNN, a novel distributed processing framework for big
data that performs probabilistic classification and regression. The framework’s core
is consisted of a k-nearest neighbor joins algorithm which, contrary to similar approaches,
is executed in a single distributed session and scales on very large volumes
of data of variable granularity and dimensionality.
Throughout this thesis, we experimentally and empirically evaluate our work using
synthetic and real-world datasets from diverse domains, against baseline and
state-of-the-art existing methods, demonstrating the efficiency and superiority of our
approaches.
Description
Δ.Δ. 15
Keywords
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

