ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Mining Optimized Association Rules for Numeric Attributes.

Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191
@inproceedings{DBLP:conf/pods/FukudaMMT96,
  author    = {Takeshi Fukuda and
               Yasuhiko Morimoto and
               Shinichi Morishita and
               Takeshi Tokuyama},
  title     = {Mining Optimized Association Rules for Numeric Attributes},
  booktitle = {Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 3-5, 1996, Montreal,
               Canada},
  publisher = {ACM Press},
  year      = {1996},
  isbn      = {0-89791-781-2},
  pages     = {182-191},
  ee        = {http://doi.acm.org/10.1145/237661.237708, db/conf/pods/FukudaMMT96.html},
  crossref  = {DBLP:conf/pods/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Given a huge database, we address the problem of finding association rules for numeric attributes, such as

(Balance in I) => (CardLoan = yes),

which implies that bank customers whose balances fall in a range I are likely to use card loan with a probability greater than p. The above rule is interesting only if the range I has some special feature with respect to the interrelation between Balance and CardLoan. It is required that the number of customers whose balances are contained in I (called the support of I) is sufficient, and also that the probability p (called the confidence ratio) should be much higher than the average probability of the condition being met.

Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing two optimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at leat a given threshold number.

Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketting as the preprocessing method, an thus obtain an efficient rule-finding system.

Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ...

Printed Edition

Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, 1996, Montreal, Canada. ACM Press 1996, ISBN 0-89791-781-2
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 853 KB]

References

[AGI+92]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AIS93a]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AIS93b]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ben84]
Jon Louis Bentley: Algorithm Design Techniques. Commun. ACM 27(9): 865-871(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FMMT96]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HCC92]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MR95]
Rajeev Motwani, Prabhakar Raghavan: Randomized Algorithms. Cambridge University Press 1995, ISBN 0-521-47465-5
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[NH94]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PS91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSF91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SA96]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SAD+96]
Michael Stonebraker, Rakesh Agrawal, Umeshwar Dayal, Erich J. Neuhold, Andreas Reuter: DBMS Research at a Crossroads: The Vienna Update. VLDB 1993: 688-692 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  2. Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180
  3. Rajeev Rastogi, Kyuseok Shim: Mining Optimized Support Rules for Numeric Attributes. ICDE 1999: 206-215
  4. Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases. VLDB 1998: 380-391
  5. Rajeev Rastogi, Kyuseok Shim: Mining Optimized Association Rules with Categorical and Numeric Attributes. ICDE 1998: 503-512
  6. Takeshi Fukuda, Hirofumi Matsuzawa: Parallel Processing of Multiple Aggregate Queries on Shared-Nothing Multiprocessors. EDBT 1998: 278-292
  7. Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
  8. Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
  9. Heikki Mannila: Methods and Problems in Data Mining. ICDT 1997: 41-55
  10. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155
  11. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: SONAR: System for Optimized Numeric AssociationRules. SIGMOD Conference 1996: 553
  12. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23

Copyright © Mon Nov 2 21:04:32 2009 by Michael Ley (ley@uni-trier.de)