ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Accurate Estimation of the Number of Tuples Satisfying a Condition.

Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
@inproceedings{DBLP:conf/sigmod/Piatetsky-ShapiroC84,
  author    = {Gregory Piatetsky-Shapiro and
               Charles Connell},
  editor    = {Beatrice Yormark},
  title     = {Accurate Estimation of the Number of Tuples Satisfying a Condition},
  booktitle = {SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts,
               June 18-21, 1984},
  publisher = {ACM Press},
  year      = {1984},
  pages     = {256-276},
  ee        = {http://doi.acm.org/10.1145/602259.602294, db/conf/sigmod/Piatetsky-ShapiroC84.html},
  crossref  = {DBLP:conf/sigmod/84},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", ">=", "<=". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.

Copyright © 1984 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Beatrice Yormark (Ed.): SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984. ACM Press 1984 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 14(2)
Contents

Online Edition: ACM Digital Library


References

[Blasgen 77]
Mike W. Blasgen, Kapali P. Eswaran: Storage and Access in Relational Data Bases. IBM Systems Journal 16(4): 362-377(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Christodoulakis 81]
...
[Christodoulakis 83]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Demolombe 80]
Robert Demolombe: Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language. VLDB 1980: 55-63 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Dixon 79]
...
[Luk 83]
W. S. Luk: On Estimating Block Accesses in Database Organizations. Commun. ACM 26(11): 945-947(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Piatetsky 84]
...
[Rowe 83]
Neil C. Rowe: Top-Down Statistical Estimation on a Database. SIGMOD Conference 1983: 135-145 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Samson 83]
W. B. Samson, A. Bendell: Rank Order Distributions and Secondary Key Indexing. Comput. J. 28(3): 309-312(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Selinger 79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Whang 83]
Kyu-Young Whang, Gio Wiederhold, Daniel Sagalowicz: Estimating Block Accesses in Database Organizations: A Closed Noniterative Formula. Commun. ACM 26(11): 940-944(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yao 77]
S. Bing Yao: Approximating the Number of Accesses in Database Organizations. Commun. ACM 20(4): 260-261(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yao 79]
S. Bing Yao: Optimization of Query Evaluation Algorithms. ACM Trans. Database Syst. 4(2): 133-155(1979) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Yu 78]
Clement T. Yu, W. S. Luk, M. K. Siu: On the Estimation of the Number of Desired Records with Respect to a Given Query. ACM Trans. Database Syst. 3(1): 41-56(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Zipf 49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Dynamic Maintenance of Wavelet-Based Histograms. VLDB 2000: 101-110
  2. Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis: Approximate Query Answering using Histograms. IEEE Data Eng. Bull. 22(4): 5-14(1999)
  3. Surajit Chaudhuri, Rajeev Motwani: On Sampling and Relational Operators. IEEE Data Eng. Bull. 22(4): 41-46(1999)
  4. Jason McHugh, Jennifer Widom: Query Optimization for XML. VLDB 1999: 315-326
  5. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Approximation of Set-Valued Query-Answers. VLDB 1999: 174-185
  6. Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate Queries on a Data Cube. SSDBM 1999: 24-33
  7. Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: On Random Sampling over Joins. SIGMOD Conference 1999: 263-274
  8. Björn Blohsfeld, Dieter Korus, Bernhard Seeger: A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. SIGMOD Conference 1999: 239-250
  9. Swarup Acharya, Viswanath Poosala, Sridhar Ramaswamy: Selectivity Estimation in Spatial Databases. SIGMOD Conference 1999: 13-24
  10. S. Muthukrishnan, Viswanath Poosala, Torsten Suel: On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. ICDT 1999: 236-256
  11. H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286
  12. Michael J. Carey, Donald Kossmann: Reducing the Braking Distance of an SQL Query Engine. VLDB 1998: 158-169
  13. Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Approximate Medians and other Quantiles in One Pass and with Limited Memory. SIGMOD Conference 1998: 426-435
  14. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459
  15. Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447
  16. Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43
  17. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  18. Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
  19. Khaled Alsabti, Sanjay Ranka, Vineet Singh: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. VLDB 1997: 346-355
  20. Min Wang, Jeffrey Scott Vitter, Balakrishna R. Iyer: Selectivity Estimation in the Presence of Alphanumeric Correlations. ICDE 1997: 169-180
  21. Banchong Harangsri, John Shepherd, Anne H. H. Ngu: Query Size Estimation Using Machine Learning. DASFAA 1997: 97-106
  22. Yannis E. Ioannidis: Query Optimization. ACM Comput. Surv. 28(1): 121-123(1996)
  23. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  24. Nabil I. Hachem, Chenye Bao, Steve Taylor: Approximate Query Answering In Numerical Databases. SSDBM 1996: 63-73
  25. Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
  26. Nick Roussopoulos, Chung-Min Chen, Stephen Kelley, Alex Delis, Yannis Papakonstantinou: The ADMS Project: View R Us. IEEE Data Eng. Bull. 18(2): 19-28(1995)
  27. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995)
  28. Georges Gardarin, Jean-Robert Gruser, Zhao-Hui Tang: A Cost Model for Clustered Object-Oriented Databases. VLDB 1995: 323-334
  29. Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244
  30. Kyu-Young Whang, Sang-Wook Kim, Gio Wiederhold: Dynamic Maintenance of Data Distribution for Selectivity Estimation. VLDB J. 3(1): 29-51(1994)
  31. Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172
  32. Qiang Zhu, Per-Åke Larson: A Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System. ICDE 1994: 144-153
  33. Arun N. Swami, K. Bernhard Schiefer: On the Estimation of Join Result Sizes. EDBT 1994: 287-300
  34. Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993)
  35. Wen-Chi Hou, Gultekin Özsoyoglu: Processing Time-Constrained Aggregate Queries in CASE-DB. ACM Trans. Database Syst. 18(2): 224-261(1993)
  36. Alfons Kemper, Guido Moerkotte, Klaus Peithner: A Blackboard Architecture for Query Optimization in Object Bases. VLDB 1993: 543-554
  37. Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267
  38. Allen Van Gelder: Multiple Join Size Estimation by Virtual Domains. PODS 1993: 180-189
  39. Priti Mishra, Margaret H. Eich: Join Processing in Relational Databases. ACM Comput. Surv. 24(1): 63-113(1992)
  40. Alfons Kemper, Guido Moerkotte, Michael Steinbrunn: Optimizing Boolean Expressions in Object-Bases. VLDB 1992: 79-90
  41. Gennady Antoshenkov: Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992: 375-382
  42. HweeHwa Pang, Hongjun Lu, Beng Chin Ooi: Query Processing in OODB. DASFAA 1991: 1-10
  43. Frédéric Andrès, Michel Couprie, Yann Viémont: A Multi-Environment Cost Evaluator for Parallel Database Systems. DASFAA 1991: 126-135
  44. Himawan Gunadhi, Arie Segev: A Framework for Query Optimization in Temporal Databases. SSDBM 1990: 131-147
  45. Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
  46. Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. PODS 1990: 40-46
  47. Meng Chang Chen, Lawrence McNamee, Norman S. Matloff: Selectivity Estimation Using Homogeneity Measurement. ICDE 1990: 304-310
  48. Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171
  49. Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77
  50. Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988)
  51. Clifford A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. VLDB 1988: 240-251
  52. M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36
  53. Ming-Chien Shan: Optimal Plan Search in a Rule-Based Query Optimizer. EDBT 1988: 92-112
  54. Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986)
  55. Brad T. Vander Zanden, Howard M. Taylor, Dina Bitton: Estimating Block Accessses when Attributes are Correlated. VLDB 1986: 119-127
  56. Michael Stonebraker: Inclusion of New Types in Relational Data Base Systems. ICDE 1986: 262-269
  57. Kazuhiro Satoh, Masashi Tsuchida, Fumio Nakamura, Kazuhiko Oomachi: Local and Global Query Optimization Mechanisms for Relational Databases. VLDB 1985: 405-417
  58. Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325

Copyright © Mon Nov 2 21:11:16 2009 by Michael Ley (ley@uni-trier.de)