Certain Answers From Uncertain Data (Posted by Parag Agrawal)
Allowing the user to compute most likely answers is a common way to provide a "simple to use" result. The result may be restricted to these high-probability results using a confidence threshold, or a top-k by confidence query. This paper is a part of the large body of work that addresses this problem. For the example above, a user might only want to get a travel warning when the chance of rain in a city of interest exceeded a threshold (say .5). This can be posed as confidence threshold query with a predicate restricting the search to only cities of interest for the user. Queries like this just "clean" up the result to remove some of the uncertainty, allowing the user to "zoom" into the interesting information in the result. I am interested in exploring other ways of cleaning uncertainty that may be useful to some applications.
While the techniques above return more certain answers, they don't resolve any uncertainty. However, can throwing more data at the problem improve results by actually reconciling uncertainty? Consider weather forecast information from multiple sources -- each could be uncertain, they could be mutually inconsistent or mutually reinforcing. Can careful resolution of these data sources yield better, more certain results? I am betting that the answer is "yes" -- This paper provides the foundation for such resolution in a principled manner.
Labels: agrawal, data integration, integration, parag, paraga, probabilistic, threshold, top-k, topk, uncertain data, uncertainty
leave a response