ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Practical Selectivity Estimation through Adaptive Sampling.

Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
@inproceedings{DBLP:conf/sigmod/LiptonNS90,
  author    = {Richard J. Lipton and
               Jeffrey F. Naughton and
               Donovan A. Schneider},
  editor    = {Hector Garcia-Molina and
               H. V. Jagadish},
  title     = {Practical Selectivity Estimation through Adaptive Sampling},
  booktitle = {Proceedings of the 1990 ACM SIGMOD International Conference on
               Management of Data, Atlantic City, NJ, May 23-25, 1990},
  publisher = {ACM Press},
  year      = {1990},
  pages     = {1-11},
  ee        = {http://doi.acm.org/10.1145/93597.93611, db/conf/sigmod/LiptonNS90.html},
  crossref  = {DBLP:conf/sigmod/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.

Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Hector Garcia-Molina, H. V. Jagadish (Eds.): Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990. ACM Press 1990 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 19(2), June 1990
Contents

Online Edition: ACM Digital Library


References

[BDT83]
Dina Bitton, David J. DeWitt, Carolyn Turbyfill: Benchmarking Database Systems A Systematic Approach. VLDB 1983: 8-19 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Chr83a]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Chr83b]
Stavros Christodoulakis: Estimating Block Selectivities. Inf. Syst. 9(1): 69-79,(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Dem80]
Robert Demolombe: Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language. VLDB 1980: 55-63 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fed84]
Jane Fedorowicz: Database Evaluation Using Multiple Regression Techniques. SIGMOD Conference 1984: 70-76 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fel68]
...
[HOT88]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HOT89]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HTY82]
Larry Kerschberg, Peter D. Ting, S. Bing Yao: Query Optimization in Star Computer Networks. ACM Trans. Database Syst. 7(4): 678-711(1982) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KK85]
Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Koo80]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LN89]
Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LN90]
Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. PODS 1990: 40-46 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LNS90]
...
[Lyn88]
Clifford A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. VLDB 1988: 240-251 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MCS88]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MD88]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MDL83]
Anthony Y. Montgomery, Daryl J. D'Souza, S. B. Lee: The Cost of Relational Algebraic Operations on Skewed Data: Estimates and Experiments. IFIP Congress 1983: 235-241 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MK85]
...
[OR86]
Frank Olken, Doron Rotem: Simple Random Sampling from Relational Databases. VLDB 1986: 160-169 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSC84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SAC+79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Kaushik Chakrabarti, Minos N. Garofalakis, Rajeev Rastogi, Kyuseok Shim: Approximate Query Processing Using Wavelets. VLDB 2000: 111-122
  2. Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi: Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. SIGMOD Conference 2000: 463-474
  3. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Approximation of Set-Valued Query-Answers. VLDB 1999: 174-185
  4. Swarup Acharya, Viswanath Poosala, Sridhar Ramaswamy: Selectivity Estimation in Spatial Databases. SIGMOD Conference 1999: 13-24
  5. Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy: Join Synopses for Approximate Query Answering. SIGMOD Conference 1999: 275-286
  6. Ashraf Aboulnaga, Surajit Chaudhuri: Self-tuning Histograms: Building Histograms Without Looking at Data. SIGMOD Conference 1999: 181-192
  7. Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999: 10-20
  8. H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286
  9. Michael J. Carey, Donald Kossmann: Reducing the Braking Distance of an SQL Query Engine. VLDB 1998: 158-169
  10. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459
  11. Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447
  12. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  13. Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
  14. Banchong Harangsri, John Shepherd, Anne H. H. Ngu: Query Size Estimation Using Machine Learning. DASFAA 1997: 97-106
  15. Evan P. Harris, Kotagiri Ramamohanarao: Join Algorithm Costs Revisited. VLDB J. 5(1): 64-84(1996)
  16. Gennady Antoshenkov, Mohamed Ziauddin: Query Processing and Optimization in Oracle Rdb. VLDB J. 5(4): 229-237(1996)
  17. Laks V. S. Lakshmanan, Fereidoon Sadri, Iyer N. Subramanian: SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems. VLDB 1996: 239-250
  18. Nabil I. Hachem, Chenye Bao, Steve Taylor: Approximate Query Answering In Numerical Databases. SSDBM 1996: 63-73
  19. Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
  20. P. Krishnan, Jeffrey Scott Vitter, Balakrishna R. Iyer: Estimating Alphanumeric Selectivity in the Presence of Wildcards. SIGMOD Conference 1996: 282-293
  21. Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz: Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Conference 1996: 271-281
  22. Nick Roussopoulos, Chung-Min Chen, Stephen Kelley, Alex Delis, Yannis Papakonstantinou: The ADMS Project: View R Us. IEEE Data Eng. Bull. 18(2): 19-28(1995)
  23. Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244
  24. Yibei Ling, Wei Sun: An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems. ICDE 1995: 532-539
  25. Chengwen Liu, I-Ping Chu: Load Balancing in Distributed Query Processing. DASFAA 1995: 256-263
  26. Augustine C. Ikeji, Farshad Fotouhi: Computation of Partial Query Results Using An Adaptive Stratified Sampling Technique. CIKM 1995: 145-149
  27. Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125
  28. Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172
  29. Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85
  30. Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami: On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994: 14-24
  31. Qiang Zhu, Per-Åke Larson: A Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System. ICDE 1994: 144-153
  32. Gennady Antoshenkov: Query Processing in DEC Rdb: Major Issues and Future Challenges. IEEE Data Eng. Bull. 16(4): 42-52(1993)
  33. Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88
  34. Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Fixed-Precision Estimation of Join Selectivity. PODS 1993: 190-201
  35. Allen Van Gelder: Multiple Join Size Estimation by Virtual Domains. PODS 1993: 180-189
  36. Priti Mishra, Margaret H. Eich: Join Processing in Relational Databases. ACM Comput. Surv. 24(1): 63-113(1992)
  37. Alfons Kemper, Guido Moerkotte, Michael Steinbrunn: Optimizing Boolean Expressions in Object-Bases. VLDB 1992: 79-90
  38. Gennady Antoshenkov: Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992: 375-382
  39. Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350
  40. Jyrki Kivinen, Heikki Mannila: Approximate Dependency Inference from Relations. ICDT 1992: 86-98
  41. Frank Olken, Doron Rotem: Maintenance of Materialized Views of Sampling Queries. ICDE 1992: 632-641
  42. S. Seshadri, Jeffrey F. Naughton: Sampling Issues in Parallel Database Systems. EDBT 1992: 328-343
  43. Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu: Error-Constraint COUNT Query Evaluation in Relational Databases. SIGMOD Conference 1991: 278-287
  44. Frédéric Andrès, Michel Couprie, Yann Viémont: A Multi-Environment Cost Evaluator for Parallel Database Systems. DASFAA 1991: 126-135
  45. Donovan A. Schneider, David J. DeWitt: Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. VLDB 1990: 469-480
  46. Jeffrey F. Naughton, S. Seshadri: On Estimating the Size of Projections. ICDT 1990: 499-513

Copyright © Mon Nov 2 21:11:37 2009 by Michael Ley (ley@uni-trier.de)