SIRET Research Group
Malostranské nám. 25,
118 00 Prague
|phone:||+420 22191 4227|
The general paradigm for content-based retrieval is the similarity search model, which consists of three key components. First, given a database of complex data objects (e.g., multimedia documents), a set of feature descriptors must be extracted from the actual data objects. Second, a distance function must be defined on the descriptors that mimics the similarity between the respective objects. Third, a query is given using the query-by-example concept, that is, distances are evaluated between a query (example) descriptor and all the descriptors in the database while those sufficiently close (similar) to the example are returned to the user as a result.
Had we accept the above outlined query process as a naive implementation (sequential search of the entire database), there would be no problem and no constraints on the distance function. However, the distance function is often computationally expensive and the databases are too large to be searched both sequentially and efficiently. Hence, there were developed various models for indexing similarity, while the most of them follow the metric space model that assumes a metric distance function. The metric postulates allow to partition the descriptor space such that query processing visits only the prospective partitions, making the search efficient. However, the restriction on just metric distances is quite serious because real-world applications often require non-metric distances or even dynamic distances that change because of evolving user preferences. The SRG aims at investigating general techniques for indexing metric, non-metric and dynamic distance functions at large scale.
The multimedia data (images, audio, video) already confirmed their dominant role within the flood of data available over the Internet. With the exponential growth of multimedia data volumes, the means of multimedia retrieval cannot keep relying just on the conventional keyword-search technology that requires an annotation given by a set of keywords. Not only the annotation is mostly unavailable for all the multimedia data at such a large scale, but even the available annotatations usually suffer from subjectiveness and incompleteness. Thus, content-based multimedia retrieval systems need to be designed that employ similarity search models and techniques considering the actual multimedia content rather than the keywords. The SRG is involved in two research directions concerning indexing in content-based multimedia retrieval, in particular, the multimedia exploration access methods and indexing adaptive similarity. The outcomes of this research will be incorporated into our web-based Smart image retrieval system (SIR).
Bioinformatics applies computer science and mathematic techniques to the field of molecular biology, in order to solve complex biological problems. Most bioinformatics areas heavily rely on similarity search. SRG puts considerable effort on implementation of efficient similarity search methods in the bioinformatics domain. We are especially interested in the following applications and we developed tools helping to solve given problems.