Skip to Content
FEBS Short-Term Fellowships
Fast Similarity Retrieval in Mass Spectra Databases
2012 - 2012
Principal investigator : Jiri Novak
GAČR P202/11/0968
Large-scale Nonmetric Similarity Search in Complex Domains
2011 - 2014

The similarity search is popular in various areas of computing, including multimedia databases, data mining, bioinformatics, etc. For a long time, the database approaches to similarity search assumed the similarity as a metric distance. Due to its properties, metric similarity allows to index a database such that it can be queried efficiently (quickly). However, together with the increasing complexity of data across various domains, there appeared many similarities in recent years that were not metrics (i.e., nonmetrics). The database research, however, is still not aware of the huge potential market for nonmetric similarity search, recognizing just the metric space model.

            This project aims to propose formal models followed by a design of access methods for efficient nonmetric similarity search, that is, search in databases where the similarity is not restricted by the metric postulates. Such a goal would bring an efficient database solution to the domain experts that need to pursue large-scale content-based retrieval tasks in complex databases, like multimedia retrieval, similarity-based data mining, complex pattern matching, classification and prediction in bioinformatics, etc.

Principal investigator : Tomas Skopal
Team member : David Hoksza, Jakub Lokoc, Jiri Novak, Juraj Mosko, Tomas Bartos
GAUK 430711
Application of Metric and Non-metric Indexing Methods in Computational Proteomics
2011 - 2012

The volume of unstructured databases grows extremely whereas its annotation is problematic. The similarity search concept based on a similarity function defined for each pair of database objects is more suitable for this kind of data. The similarity is usually modelled by a distance function satisfying metric axioms, which allows efficient indexing. However, metric axioms can be very restrictive for domain experts who may prefer non-metric functions. Hence database experts have to solve this problem by converting non-metric functions to metric ones or by developing new types of non-metric indexing methods.

One of the areas where metric/non-metric similarity searching is used is computational proteomics. During the determination of the biological function of an "unknown" protein, retrieval of "known" proteins with similar structures (and thus probably with similar function) is very useful. Moreover, fast and cheap determination of protein structures is also an open problem. From database point of view, it is possible to use databases of known protein structures to address this problem. In this approach, sequence-structure similarity functions are used to obtain structures that can be similar to the searched structure of the protein.

Our goal is developing high-quality structure and sequence-structure similarity functions and methods for their indexing.

Principal investigator : Jakub Galgonek
Team member : Tomas Skopal, Jiri Novak, Jakub Lokoc