ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Sampling Large Databases for Association Rules.

Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145
@inproceedings{DBLP:conf/vldb/Toivonen96,
  author    = {Hannu Toivonen},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Sampling Large Databases for Association Rules},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {134-145},
  ee        = {db/conf/vldb/Toivonen96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Discovery of association rules is an important database mining problem. Current algorithms for finding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very significant for very large databases. We present new algorithms that reduce the database activity considerably. The idea is to pick a random sample, to find using this sample all association rules that probably hold in the whole database, and then to verify the results with the rest of the database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and in those rare cases where our sampling method does not produce all association rules, the missing rules can be found in a second pass. Our experiments show that the proposed algorithms can find association rules very efficiently in only one database pass.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

References

[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AMS+96]
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 1996: 307-328 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS92]
Noga Alon, Joel Spencer: The Probabilistic Method. John Wiley 1992, ISBN 0-471-53588-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FPSSU96]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press 1996, ISBN 0-262-56097-6
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HF95]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HKMT95]
Marcel Holsheimer, Martin L. Kersten, Heikki Mannila, Hannu Toivonen: A Perspective on Databases and Data Mining. KDD 1995: 150-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HS92]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HS93]
...
[KM94]
Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KMR+94]
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LSL95]
Hongjun Lu, Rudy Setiono, Huan Liu: NeuroRule: A Connectionist Approach to Data Mining. VLDB 1995: 478-489 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MT96]
...
[MTV94]
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Efficient Algorithms for Discovering Association Rules. KDD Workshop 1994: 181-192 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSF91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SA95]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SON95]
Ashok Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[TKR+95]
...

Referenced by

  1. David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB J. 8(3-4): 222-236(2000)
  2. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  3. Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: Data Mining and the Web: Past, Present and Future. Workshop on Web Information and Data Management 1999: 43-47
  4. Laks V. S. Lakshmanan, Raymond T. Ng, Jiawei Han, Alex Pang: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. SIGMOD Conference 1999: 157-168
  5. Christian Hidber: Online Association Rule Mining. SIGMOD Conference 1999: 145-156
  6. Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal: Discovering Frequent Closed Itemsets for Association Rules. ICDT 1999: 398-416
  7. Holger Günzel, Jens Albrecht, Wolfgang Lehner: Data Mining in a Multidimensional Environment. ADBIS 1999: 191-204
  8. Charu C. Aggarwal, Philip S. Yu: Mining Large Itemsets for Association Rules. IEEE Data Eng. Bull. 21(1): 23-31(1998)
  9. Sridhar Ramaswamy, Sameer Mahajan, Abraham Silberschatz: On the Discovery of Interesting Patterns in Association Rules. VLDB 1998: 368-379
  10. David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB 1998: 311-322
  11. Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal: Integrating Mining with Relational Database Systems: Alternatives and Implications. SIGMOD Conference 1998: 343-354
  12. Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang: Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998: 13-24
  13. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998: 94-105
  14. Banu Özden, Sridhar Ramaswamy, Abraham Silberschatz: Cyclic Association Rules. ICDE 1998: 412-421
  15. Rosa Meo, Giuseppe Psaila, Stefano Ceri: A Tightly-Coupled Architecture for Data Mining. ICDE 1998: 316-323
  16. Jun-Lin Lin, Margaret H. Dunham: Mining Association Rules: Anti-Skew Algorithms. ICDE 1998: 486-493
  17. Charu C. Aggarwal, Philip S. Yu: Online Generation of Association Rules. ICDE 1998: 402-411
  18. Dao-I Lin, Zvi M. Kedem: Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set. EDBT 1998: 105-119
  19. Marek Wojciechowski, Maciej Zakrzewicz: Itemset Materializing for Fast Mining of Association Rules. ADBIS 1998: 284-295
  20. Renée J. Miller, Yuping Yang: Association Rules over Interval Data. SIGMOD Conference 1997: 452-461
  21. Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264
  22. Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
  23. Heikki Mannila: Methods and Problems in Data Mining. ICDT 1997: 41-55
  24. Tadeusz Morzy, Maciej Zakrzewicz: SQL-Like Language for Database Mining. ADBIS 1997: 311-317

Copyright © Mon Nov 2 21:19:02 2009 by Michael Ley (ley@uni-trier.de)