Skip to Content
FEBS Short-Term Fellowships
Similarity Retrieval in Protein Structure Databases
2012 - 2012

Similarity retrieval has a wide usage in many bioinformatics tasks. A typical case is similarity retrieval in databases of protein structures. However, this problem is not satisfactorily solved yet, especially in the comparison with the similarity retrieval in databases of protein sequences which is successfully solved (e.g., by BLAST).

Thus, the goal of our proposed research is to develop a tool for similarity retrieval in protein structure databases. The tool will be accessible online as a web application that will support not only protein structures but also protein sequences as a query input. In the case that a query is the sequence of a protein, the system should select such structures that are similar to the (unknown) structure of the query protein. In general, such behavior can be achieved by use of so-called sequence-structure similarity measures.

The guest laboratory (AG Porto) has extensive experiences with the design of the protein structure similarities and also with sequence-structure similarities. Our research group (SIRET) develops efficient and effective methods for similarity retrieval in huge databases. We have also experience with application of these methods on biological data. Hence, we hope that the joint research can result in a tool solving the problem with higher efficiency and effectiveness.

Principal investigator : Jakub Galgonek
GAUK 430711
Application of Metric and Non-metric Indexing Methods in Computational Proteomics
2011 - 2012

The volume of unstructured databases grows extremely whereas its annotation is problematic. The similarity search concept based on a similarity function defined for each pair of database objects is more suitable for this kind of data. The similarity is usually modelled by a distance function satisfying metric axioms, which allows efficient indexing. However, metric axioms can be very restrictive for domain experts who may prefer non-metric functions. Hence database experts have to solve this problem by converting non-metric functions to metric ones or by developing new types of non-metric indexing methods.

One of the areas where metric/non-metric similarity searching is used is computational proteomics. During the determination of the biological function of an "unknown" protein, retrieval of "known" proteins with similar structures (and thus probably with similar function) is very useful. Moreover, fast and cheap determination of protein structures is also an open problem. From database point of view, it is possible to use databases of known protein structures to address this problem. In this approach, sequence-structure similarity functions are used to obtain structures that can be similar to the searched structure of the protein.

Our goal is developing high-quality structure and sequence-structure similarity functions and methods for their indexing.

Principal investigator : Jakub Galgonek
Team member : Tomas Skopal, Jiri Novak, Jakub Lokoc