Distributed Information Systems Laboratory LSIR

Querying Imprecise Data using Probabilistic Methods

Project Details

Querying Imprecise Data using Probabilistic Methods

Laboratory : LSIR Master Completed

This project deals with data management and query processing on imprecise data. Here, imprecise data mainly includes data obtained from sensor networks. This data is imprecise since it contains system noise and observation noise. A promising approach for processing such data uses probabilistic models to characterize the imprecision in data. Thus in model-based probabilistic data management we build a probabilistic model on the data, then a user query is answered directly using the model rather than processing the raw imprecise data. This approach has the following advantages: 


  • Probabilistic queries could be answered since the DBMS inherently incorporates and stores probabilistic information with data.
  • More than one type of model could be built on the same underlying data. Thus leading to support for different and richer set of queries.
  • As queries could be answered only using the model, in certain situations, the original imprecise data could be discarded (or stored on slow disks) thus saving space.

Scope of the Project:
Along with the advantages, probabilistic data management poses a large number of challenges. Here we give some of the most important challenges:


  • How could probability information be efficiently stored and processed in a DBMS?
  • Are there algorithms which can efficiently and effectively perform query processing and summarize probabilistic data?
  • Are there sufficiently accurate methods for characterizing imprecision in sensor data? Moreover, are there methods of creating probability information for the raw imprecise data?


Depending on the student's interest he/she can focus on one or more challenges said above. Specifically, the focus could be on implementation related issues or theoretical aspects involved in addressing these challenges.

Prerequisite: Basic knowledge of algorithms, probability and statistics. Programming languages like Java or C++. Course on databases (or advanced databases) is a plus.

Contact: Saket Sathe