Skip to Content

Indexing Similarity

The general paradigm for content-based retrieval is the similarity search model, which consists of three key components. First, given a database of complex data objects (e.g., multimedia documents), a set of feature descriptors must be extracted from the actual data objects. Second, a distance function must be defined on the descriptors that mimics the similarity between the respective objects. Third, a query is given using the query-by-example concept, that is, distances are evaluated between a query (example) descriptor and all the descriptors in the database while those sufficiently close (similar) to the example are returned to the user as a result.

Had we accept the above outlined query process as a naive implementation (sequential search of the entire database), there would be no problem and no constraints on the distance function. However, the distance function is often computationally expensive and the databases are too large to be searched both sequentially and efficiently. Hence, there were developed various models for indexing similarity, while the most of them follow the metric space model that assumes a metric distance function. The metric postulates allow to partition the descriptor space such that query processing visits only the prospective partitions, making the search efficient. However, the restriction on just metric distances is quite serious because real-world applications often require non-metric distances or even dynamic distances that change because of evolving user preferences. The SRG aims at investigating general techniques for indexing metric, non-metric and dynamic distance functions at large scale.

Multimedia

The multimedia data (images, audio, video) already confirmed their dominant role within the flood of data available over the Internet. With the exponential growth of multimedia data volumes, the means of multimedia retrieval cannot keep relying just on the conventional keyword-search technology that requires an annotation given by a set of keywords. Not only the annotation is mostly unavailable for all the multimedia data at such a large scale, but even the available annotatations usually suffer from subjectiveness and incompleteness. Thus, content-based multimedia retrieval systems need to be designed that employ similarity search models and techniques considering the actual multimedia content rather than the keywords. The SRG is involved in two research directions concerning indexing in content-based multimedia retrieval, in particular, the multimedia exploration access methods and indexing adaptive similarity. The outcomes of this research will be incorporated into our web-based Smart image retrieval system (SIR).

 

While traditional content-based retrieval approaches provide query-driven access under the assumption that the users' needs are clearly specified, modern content-based exploration approaches support users in browsing and navigating through multimedia databases when imprecise or even no retrieval intent is given. By means of interactive graphical user interfaces, exploration approaches offer a convenient and intuitive access to unknown multimedia databases which becomes even more important with the arrival of powerful mobile devices, such as the iPad and iPhone.

When determining content-based similarity between two multimedia objects, the distance is evaluated on feature representations which aggregate the inherent properties of the multimedia objects. The conventional feature representations aggregate and store these properties in feature histograms, which can be compared by vectorial distances. Recent feature representations adaptively aggregate and store individual object properties in more flexible feature signatures, which can be compared by adaptive similarity measures, such as the quadratic form distance or the Earth mover's distance.

Bioinformatics & Cheminformatics

Bioinformatics consists in application of computer science and mathematics to the field of molecular biology, in order to solve complex biological problems. We are interested mainly in the development of efficient algorithms and computational tools which are widely used by the bioinformatics community. We are mostly involved in the following fields.

 

In the visualization domain, we develop both algorithms for visualization of non-trivial data types and web-based tools for visualization of molecular structure information. Our aim is not only to come up with new visualization approaches but also to go the last mile and develop complete software solutions.

Macromolecules such as proteins, DNA or RNA perform many different biological functions and so ensure most of the vital processes in living organisms. The protein function is determined by its interaction with other molecules and thus detection of regions where protein interacts and how the interaction is carried out finds application in many areas including drug development. In this field, we mostly focus on the detection of active sites from protein 3D structures.