ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

The Power of Sampling in Knowledge Discovery.

Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85
@inproceedings{DBLP:conf/pods/KivinenM94,
  author    = {Jyrki Kivinen and
               Heikki Mannila},
  title     = {The Power of Sampling in Knowledge Discovery},
  booktitle = {Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, May 24-26, 1994, Minneapolis,
               Minnesota},
  publisher = {ACM Press},
  year      = {1994},
  isbn      = {0-89791-642-5},
  pages     = {77-85},
  ee        = {http://doi.acm.org/10.1145/182591.182601, db/conf/pods/pods94-77.html},
  crossref  = {DBLP:conf/pods/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least epsilon can be detected as false by considering the sample. The sample sizes are O((log n)/epsilon) or O((|M|(1-1/k) log n)/epsilon), depending on the error measure used. We also consider universal-existential sentences.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ...

Printed Edition

Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 24-26, 1994, Minneapolis, Minnesota. ACM Press 1994, ISBN 0-89791-642-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

[Abstract and Index Terms]
[Full Text in PDF Format, 939 KB]

References

[Ch93]
William W. Cohen: Pac-Learning a Restricted Class of Recursive Logic Programs. AAAI 1993: 86-92 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fag82]
Ronald Fagin: Horn clauses and database dependencies. J. ACM 29(4): 952-985(1982) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GHK92]
Adam J. Grove, Joseph Y. Halpern, Daphne Koller: Asymptotic Conditional Probabilities for First-Order Logic. STOC 1992: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HKM94]
...
[HO91]
Wen-Chi Hou, Gultekin Özsoyoglu: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Trans. Database Syst. 16(4): 600-654(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HS92]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KI91]
Ravi Krishnamurthy, Tomasz Imielinski: Research Directions in Knowledge Discovery. SIGMOD Record 20(3): 76-78(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KM92]
Jyrki Kivinen, Heikki Mannila: Approximate Dependency Inference from Relations. ICDT 1992: 86-98 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LNS90]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Mug92]
...
[NS90]
Jeffrey F. Naughton, S. Seshadri: On Estimating the Size of Projections. ICDT 1990: 499-513 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Nii87]
...
[Odd86]
...
[OSW89]
...
[PS91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSF91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Roy90]
Shaibal Roy: Semantic Complexity of Classes of Relational Queries and Query Independent Data Partitioning. PODS 1991: 259-267 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ull88]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Val84]
Leslie G. Valiant: A Theory of the Learnable. Commun. ACM 27(11): 1134-1142(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ZZ93]
...

Referenced by

  1. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  2. Heikki Mannila: Methods and Problems in Data Mining. ICDT 1997: 41-55
  3. Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145

Copyright © Mon Nov 2 21:04:24 2009 by Michael Ley (ley@uni-trier.de)