ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Recovering Information from Summary Data.

Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45
@inproceedings{DBLP:conf/vldb/FaloutsosJS97,
  author    = {Christos Faloutsos and
               H. V. Jagadish and
               Nikolaos Sidiropoulos},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Recovering Information from Summary Data},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {36-45},
  ee        = {db/conf/vldb/FaloutsosJS97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the uniformity and independence assumptions as a special case, and that it can achieve better reconstruction results if we maximize the smoothness as opposed to the uniformity.
In our experiments on real and synthetic datasets, the proposed method almost consistently outperforms its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data warehousing and histograms in query optimization.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

From CS Dept., University Trier (Germany)

References

[1]
...
[2]
...
[3]
Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Divesh Srivastava, Shaul Dar, H. V. Jagadish, Alon Y. Levy: Answering Queries with Aggregation Using Views. VLDB 1996: 318-329 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
...
[8]
...
[9]
Georg Gottlob, Roberto Zicari: Closed World Databases Opened Through Null Values. VLDB 1988: 50-61 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Ashish Gupta, Venky Harinarayan, Dallan Quass: Aggregate-Query Processing in Data Warehousing Environments. VLDB 1995: 358-369 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
Ashish Gupta, Inderpal Singh Mumick, V. S. Subrahmanian: Maintaining Views Incrementally. SIGMOD Conference 1993: 157-166 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Tomasz Imielinski, Witold Lipski Jr.: Incomplete Information in Relational Databases. J. ACM 31(4): 761-791(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
H. V. Jagadish: The INCINERATE Data Model. ACM Trans. Database Syst. 20(1): 71-110(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
H. V. Jagadish, Inderpal Singh Mumick, Abraham Silberschatz: View Maintenance Issues for the Chronicle Data Model. PODS 1995: 113-124 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
David G. Luenberger: Introduction to Linear and Nonlinear Programming. Addison-Wesley 1973
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Francesco M. Malvestuto: A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables. ACM Trans. Database Syst. 18(4): 678-708(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
Wee Keong Ng, Chinya V. Ravishankar: Information Synthesis in Statistical Databases. CIKM 1995: 355-361 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[23]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
Chung-Dak Shum, Richard R. Muntz: An Information-Theoretic Study on Aggregate Responses. VLDB 1988: 479-490 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
Andreas S. Weigend, Neil A. Gerschenfeld: Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley 1994, ISBN 0-201-62601-2
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[28]
Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Francesco Buccafurri, Filippo Furfaro, Domenico Saccà: Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach. ICDT 2001: 390-404
  2. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  3. Rakesh Agrawal, Ramakrishnan Srikant: Privacy-Preserving Data Mining. SIGMOD Conference 2000: 439-450
  4. Mehmet M. Dalkilic, Edward L. Robertson: Information Dependencies. PODS 2000: 245-253
  5. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  6. Carlos A. Hurtado, Alberto O. Mendelzon, Alejandro A. Vaisman: Maintaining Data Cubes under Dimension Updates. ICDE 1999: 346-355
  7. Zuotao Li, Xiaoyang Sean Wang, Menas Kafatos, Ruixin Yang: A Pyramid Data Model for Supporting Content-Based Browsing and Knowledge Discovery. SSDBM 1998: 170-179
  8. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342

Copyright © Mon Nov 2 21:19:07 2009 by Michael Ley (ley@uni-trier.de)