Distributed Information Systems Laboratory LSIR

Query processing for massive sensor time series in the cloud

Project Details

Query processing for massive sensor time series in the cloud

Laboratory : LSIR Semester / Master Completed



As various kinds of sensors penetrate our daily life, our planet is undertaking the vast deployment of sensors embedded in different devices that monitor various phenomena for different applications of interest, e.g., air/electrosmog pollution, radiation, early earthquake detection, soil, etc. Traditional sensor data management systems built on top of relational database system take these raw discrete observations as the first citizen. But many inconveniences accompany such raw sensor data based management system for either research or application communities. To this end, various model-based sensor data management techniques have been proposed. Models capture the inherent correlations (e.g., temporally and spatially) in the sensor data stream through splitting sensor data into disjoint segments and approximating each segment with different types of models (e.g., regression and probabilistic.).

In this project, we aim to explore how to manage large scale sensor time series in the form of segment models in the cloud environment. Concretely, we focus on utilizing the NoSQL cloud store (e.g., HBase) and MapReduce distributed computing to support efficient range and join queries over massive model-view sensor data. Previous efforts to optimize the join and range query processing with MapReduce concentrate on scheduling the data partition based on some prior cardinality information or integrating various kinds of indices into the data file such that Mappers are able to filter unqualified data via the indices. However, aforementioned approaches either still need to access the whole dataset or incur additional storage overhead. Therefore, we will exploit the KVM-index specialized for model-view sensor data in cloud stores to achieve computing-storage efficient join and range queries over multiple model-view sensor time series.

  • Having the motivation for indulging in a research oriented project
  • Familiar with basic query processing and optimization techniques in database area.
  • Programming skills with Java and experience on MapReduce and HBase is a plus.


In case of any questions, please drop us an email or come to our offices:

Contact: Tian Guo