ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

A Cost Model for Similarity Queries in Metric Spaces.

Paolo Ciaccia, Marco Patella, Pavel Zezula: A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59-68
@inproceedings{DBLP:conf/pods/CiacciaPZ98,
  author    = {Paolo Ciaccia and
               Marco Patella and
               Pavel Zezula},
  title     = {A Cost Model for Similarity Queries in Metric Spaces},
  booktitle = {Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 1-3, 1998, Seattle, Washington},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-996-3},
  pages     = {59-68},
  ee        = {http://doi.acm.org/10.1145/275487.275495, db/conf/pods/CiacciaPZ98.html},
  crossref  = {DBLP:conf/pods/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method. Our results rely on the assumption that the indexed dataset comes from a metric space which is "homogeneous" enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ...

Printed Edition

Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington. ACM Press 1998, ISBN 0-89791-996-3
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1094 KB]

References

[1]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Alberto Belussi, Christos Faloutsos: Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995: 299-310 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Tolga Bozkaya, Z. Meral Özsoyoglu: Distance-Based Indexing for High-Dimensional Metric Spaces. SIGMOD Conference 1997: 357-368 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
...
[6]
Sergey Brin: Near Neighbor Search in Large Metric Spaces. VLDB 1995: 574-584 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
...
[8]
Tzi-cker Chiueh: Content-Based Image Indexing. VLDB 1994: 582-593 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Paolo Ciaccia, Marco Patella, Pavel Zezula: Processing Complex Similarity Queries with Distance-Based Access Methods. EDBT 1998: 9-23 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
Christos Faloutsos, Ibrahim Kamel: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. PODS 1994: 4-13 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Joseph M. Hellerstein, Jeffrey F. Naughton, Avi Pfeffer: Generalized Search Trees for Database Systems. VLDB 1995: 562-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Ibrahim Kamel, Christos Faloutsos: On Packing R-trees. CIKM 1993: 490-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
...
[18]
...
[19]
Apostolos Papadopoulos, Yannis Manolopoulos: Performance of Nearest Neighbor Queries in R-Trees. ICDT 1997: 394-408 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Referenced by

  1. Stefan Berchtold, Christian Böhm, Daniel A. Keim, Florian Krebs, Hans-Peter Kriegel: On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. ICDT 2001: 435-449
  2. Roger Weber, Klemens Böhm: Trading Quality for Time with Nearest Neighbor Search. EDBT 2000: 21-35
  3. Tolga Bozkaya, Z. Meral Özsoyoglu: Indexing Large Metric Spaces for Similarity Search Queries. ACM Trans. Database Syst. 24(3): 361-404(1999)
  4. Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
  5. Pavel Zezula, Pasquale Savino, Giuseppe Amato, Fausto Rabitti: Approximate Similarity Retrieval with M-Trees. VLDB J. 7(4): 275-293(1998)

Copyright © Mon Nov 2 21:04:37 2009 by Michael Ley (ley@uni-trier.de)