Large-scale Nonmetric Similarity Search in Complex Domains
Start year:
2011
End year:
2014
The similarity search is popular in various areas of computing, including multimedia databases, data mining, bioinformatics, etc. For a long time, the database approaches to similarity search assumed the similarity as a metric distance. Due to its properties, metric similarity allows to index a database such that it can be queried efficiently (quickly). However, together with the increasing complexity of data across various domains, there appeared many similarities in recent years that were not metrics (i.e., nonmetrics).
Application of Metric and Non-metric Indexing Methods in Computational Proteomics
Start year:
2011
End year:
2012
The volume of unstructured databases grows extremely whereas its annotation is problematic. The similarity search concept based on a similarity function defined for each pair of database objects is more suitable for this kind of data. The similarity is usually modelled by a distance function satisfying metric axioms, which allows efficient indexing. However, metric axioms can be very restrictive for domain experts who may prefer non-metric functions.
SETTER web-server utilizes SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) method for fast and accurate structural pairwise alignment. The server is capable of comparing a pair of RNA structures or using one strucutre as a query and search against a user-defined database of RNA structures. The efficiency of the algorithm is given by the decomposition of the RNA structure into the set of non-overlapping generalized secondary structure motifs (GSSUs). GSSU usually resembles a hairpin motif possibly containing bulges and/or internal loops in its stem part. A segmentation to GSSUs offers good scalability with respect to the structure size (SETTER scales linearly with the structure size) because the number of residues in GSSUs (SETTER scales quadratically with the GSSU size) generally does not increase with increased size of the RNA structure. The underlying SETTER algorithm is both accurate and very fast, and does not impose limits on the size of aligned RNA structures. SETTER is able to compare a pair of even the largest RNA structure in less than one minute.
When determining visual similarity of two images, it is evaluated on feature representations which consist of some content-based image properties. The conventional feature representations aggregate and store these properties in global feature histograms (e.g.,
MPEG-7 visual descriptors).
Recent feature representations, however, adaptively aggregate local image features in more flexible feature signatures, which can be
compared by adaptive similarity measures. The SIR engine developed at SIRET research group combines traditional MPEG-7 visual descriptors with feature signatures, leading to improved similarity search in image collections.
Currently, the SIR engine operates in a demo mode as a standalone image search engine. In order to manage large image collections in real time, the engine employs original database indexing technology. The SIR engine also includes meta-search functionality that allows to augment/rerank/explore results provided by other image search engines, such as Google Images and others. The actual version of the online re-ranking and exploration tool employes the particle physics model, that both distributes images on the screen and automatically creates visually similar clusters (as a side effect). To refer this tool, you can refer our publications - Image Exploration using Online Feature Extraction and Reranking (ICMR, 2012) and SIR: The Smart Image Retrieval Engine (SISAP, 2012).
node (page) - Melior pagus vulpes meus jugis ut. Iriure valde consectetuer abigo at bene ideo commoveo. Quadrum abbas vero macto neo probo ille et vulpes. Sino virtus rusticus brevitas mos damnum ad. Vulputate et vero zelus feugiat os olim obruo. Facilisis quadrum proprius gravis velit humo nunc wisi imputo. Antehabeo validus dolore tation facilisi ullamcorper gemino.