Justin Meyer

Institution: 
Santa Barbara City College
Year: 
2009

Indexing Uncertainties for Probalistic Databases

Efficient management of uncertain data is becoming an active area of research for computer scientists with applications in fields such as biology. For instance, techniques used to retrieve bio-images, such as cell marking, are imperfect causing the images to contain “noise” or imperfections, making retrieval of exact data difficult. Probabilistic databases are databases that allow users to store and query over uncertain data. However, query processing in these databases is computationally hard and more time consuming than in traditional database systems. Our objective is to create an index structure that appropriately handles uncertain data and supports faster data retrieval for range and selection queries. To accomplish this, we have developed a B-tree capable of handling uncertain data by indexing on the values and maintaining an inverted index at each node to provide access to the tuples containing that value. This structure allows for O(log n) access time while maintaining the ability to perform range queries as well. Without an index structure the retrieval time grows linearly with the data size, and thus does not scale well for large datasets. We plan to validate our method experimentally for real and synthetic datasets as well as compare it to alternative approaches for indexing uncertain data.

UC Santa Barbara Center for Science and Engineering Partnerships UCSB California NanoSystems Institute