Statistical Estimators for Aggregate Relational Algebra Queries.

Wen-Chi Hou, Gultekin Özsoyoglu: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Trans. Database Syst. 16(4): 600-654(1991)
  author    = {Wen-Chi Hou and
               Gultekin {\"O}zsoyoglu},
  title     = {Statistical Estimators for Aggregate Relational Algebra Queries},
  journal   = {ACM Trans. Database Syst.},
  volume    = {16},
  number    = {4},
  year      = {1991},
  pages     = {600-654},
  ee        = {, db/journals/tods/HouO91.html},
  bibsource = {DBLP,}


This paper discusses the estimation of COUNT(E) queries by sampling, where E is an arbitrary relational algebra expression. Consistent and unbiased statistical estimators for COUNT(E) are proposed without any assumptions on the distributions of attribute values or the orderings of tuples in the operand relations of E.

We present a set of COUNT(E) estimator evaluation algorithms, all based on simple random sampling, and each suitable for a different type of relational algebra expression E. To improve the efficiency, we propose three enhancements, and revise the estimator evaluation algorithms to incorporate the enhancements. One of the enhancements is the use of cluster sampling.

Estimator evaluation algorithms with the enhancements have been incorporated into a prototype DBMS. We present the performance evaluation of the estimators using the implemented prototype DBMS.

Copyright © 1991 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 2, TODS 1991-1995, TKDE 1989-1992" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

[Index Terms and Review]
[Full Text in PDF Format, 3013 KB]


William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
Stavros Christodoulakis: Estimating record selectivities. Inf. Syst. 8(2): 105-115(1983) BibTeX
Sakti P. Ghosh: SIAM: statistics information access method. Inf. Syst. 13(4): 359-368(1988) BibTeX
Sakti P. Ghosh: Numerical Operations on a Relational Database. IEEE Trans. Software Eng. 15(5): 600-610(1989) BibTeX
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 BibTeX
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77 BibTeX
Anthony C. Klug: Access Paths in the 'ABE' Statistical Query Facility. SIGMOD Conference 1982: 161-173 BibTeX
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) BibTeX
Frank Olken, Doron Rotem: Simple Random Sampling from Relational Databases. VLDB 1986: 160-169 BibTeX
Neil C. Rowe: Antisampling for Estimation: An Overview. IEEE Trans. Software Eng. 11(10): 1081-1091(1985) BibTeX
Arie Shoshani, Frank Olken, Harry K. T. Wong: Characteristics of Scientific Databases. VLDB 1984: 147-160 BibTeX

Referenced by

  1. Christos Faloutsos, Yossi Matias, Abraham Silberschatz: Modeling Skewed Distribution Using Multifractals and the `80-20' Law. VLDB 1996: 307-317
  2. Gultekin Özsoyoglu, Sujatha Guru Swamy, Kaizheng Du, Wen-Chi Hou: Time-Constrained Query Processing in CASE-DB. IEEE Trans. Knowl. Data Eng. 7(6): 865-884(1995)
  3. Paolo Ciaccia, Dario Maio: Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases. IEEE Trans. Knowl. Data Eng. 7(4): 641-655(1995)
  4. Daniel Stamate, Henri Luchian, Ben Paechter: A General Model for the Answer-Perturbation Techniques. SSDBM 1994: 90-96
  5. Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125
  6. Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85
  7. Wen-Chi Hou, Gultekin Özsoyoglu: Processing Time-Constrained Aggregate Queries in CASE-DB. ACM Trans. Database Syst. 18(2): 224-261(1993)
  8. Richard T. Snodgrass, Santiago Gomez, L. Edwin McKenzie: Aggregates in the Temporal Query Language TQuel. IEEE Trans. Knowl. Data Eng. 5(5): 826-842(1993)
  9. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  10. Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88
  11. Gultekin Özsoyoglu, Kaizheng Du, Sujatha Guru Swamy, Wen-Chi Hou: Processing Real-Time, Non-Aggregate Queries with Time-Constraints in CASE-DB. ICDE 1992: 410-417
  12. Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu: Error-Constraint COUNT Query Evaluation in Relational Databases. SIGMOD Conference 1991: 278-287
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Tue Jun 24 18:39:11 2008