Skip to Content
GAČR 15-00885S
Novel methods for computational prediction and visualization of secondary structures of ribosomal ribonucleic acids - an integrated solution
2015 - 2017
Ribosomal ribonucleic acid (rRNA) is essential for the proteins synthesis, one of the most basic biological processes. To understand its mechanisms, the knowledge of the rRNA structure is required as it forms the structural core of the protein synthesizing unit, the ribosome. Experimental identification of rRNA structure is extremely technically difficult. Secondary structure, a simplified structural model, can be predicted, but prediction for rRNAs is hindered by extreme length of rRNA sequences. Thus, only few eukaryotic rRNA structures are available so far. We will employ information about evolutionary conserved segments of eukaryotic rRNA sequences for secondary structure prediction. An algorithmic workflow integrating the secondary structure prediction pipeline, visualization algorithm and database of the predicted structures will be developed. The predicted rRNA structures will be used for a novel bioinformatic identification of evolutionary conserved structural motifs in eukaryotic rRNAs that may bring new insights into the role of rRNA in the protein synthesis.
Co-Investigator : David Hoksza
GAČR 14-29032P
Efficient chemical space exploration using multi-objective optimization
2014 - 2016
Recently, we have developed a method for a systematic generation of the chemical space lying between a given pair of small organic molecules. The intended use follows the similar property principle stating that "similar compounds have similar properties". Thus, if a pair of molecules shows a similar function molecules close in the chemical space should behave alike.
 
Our approach has two major drawbacks which we would like to tackle in the proposed project - 1) a path (subspace) between the input molecules is not guaranteed to be found and 2) the exploration process is driven purely by structure and does not take into account physicochemical and biological properties of generated compounds. In order to solve the first problem we propose to use an approach inspired by scaffold hopping. We will utilize multiple scaffold types retaining different levels of structural information to reduce the complexity of the chemical space. We propose that a path in a less complex chemical space is more likely to be identified. In the second part of the project we will modify the exploration process so it will not be performed in the structural space but in the biologically more meaningful space of features such as, e.g., ADME/Tox properties. This will be done by projecting starting molecules into multidimensional space of the physicochemical and/or biological properties, and using multi-objective optimization to drive the exploration towards the desired optima. Finally, we propose a way how to mix scaffold hopping and bioactivity based exploration into a novel chemical space exploration approach.
Principal investigator : David Hoksza
GAČR P202/11/0968
Large-scale Nonmetric Similarity Search in Complex Domains
2011 - 2014

The similarity search is popular in various areas of computing, including multimedia databases, data mining, bioinformatics, etc. For a long time, the database approaches to similarity search assumed the similarity as a metric distance. Due to its properties, metric similarity allows to index a database such that it can be queried efficiently (quickly). However, together with the increasing complexity of data across various domains, there appeared many similarities in recent years that were not metrics (i.e., nonmetrics). The database research, however, is still not aware of the huge potential market for nonmetric similarity search, recognizing just the metric space model.

            This project aims to propose formal models followed by a design of access methods for efficient nonmetric similarity search, that is, search in databases where the similarity is not restricted by the metric postulates. Such a goal would bring an efficient database solution to the domain experts that need to pursue large-scale content-based retrieval tasks in complex databases, like multimedia retrieval, similarity-based data mining, complex pattern matching, classification and prediction in bioinformatics, etc.

Principal investigator : Tomas Skopal
Team member : David Hoksza, Jakub Lokoc, Jiri Novak, Juraj Mosko, Tomas Bartos
GAUK 57907
Similarity search in biological databases
2007 - 2008

In recent years volume of gene and protein banks (databases) grows rapidly. The reason for storing huge volumes of gene and protein sequences in one place is not only for browsing these sequences itself, but in the first place searching for similarities among stored sequences. Similar sequences indicate similar functionality which helps in finding functions of unknown genes.

Current techniques for finding similarity among data sequences go through whole databases of genes and proteins, and examine similarity between query and every sequence in the database. As the volume of databases grows, the time for finding similar seqences increases linearly.

Hence, the goal of the project is an application of multimedia indexing methods to speed up searching in biological databases (primarily genom and protein databases). In the project, we will examine (primarily in the first year - plan of future works is in further sections) the ability of existing indexing methods to index different types of biological data (or their modification) in a way that will be optimal for biological data.

Principal investigator : David Hoksza
Team member : Tomas Skopal