Skip to Content

Bioinformatics & Cheminformatics

Molpher (Software tool for exploration of the chemical space)

Molpher aims to be a scalable and interactive software tool to aid exploration of chemical space, the vast universe containing all possible compounds. Many areas of chemical biology, such as drug discovery, rely heavily on chemical libraries offering compounds usable in the industrial processes. Given a set of molecules with desired characteristics, Molpher explores their common neighbourhood based on structural similarity, as it represents promising part of the chemical space to find new additions into those libraries. In order to decrease the chance of missing interesting parts of the space, Molpher offers the human researcher to observe and interactively alter the exploration process. Generated subspace is expected to be further tested for synthesizability and biological activity by other software tools.

Among main features, Molpher offers optimized parallel exploration algorithm, compound logging, dimension-reduced visualization of chemical space and interactive widget-based GUI. Codebase is extensible in terms of additional morphing operators, chemical fingerprints, similarity measures and visualization strategies to allow further experiments.

David Hoksza, Petr Škoda
P2Rank (Protein-Ligand binding site prediction)
P2Rank is a machine learning based method for prediction of ligand binding sites from protein structure. P2Rank uses Random Forests classifier to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. P2Rank is freely available at
P3S (Protein structure similarity search)

Proteins can perform many different biological functions and so ensure most of the vital processes in the living organisms. From a chemical point of view, proteins are linear polymer chains made of only twenty kinds of amino acids. However, they can fold into various 3D structures resulting in huge functional diversity. The study of protein functions is one of the areas where similarity search in protein structure databases is widely used. The usage follows from the thesis that proteins having similar 3D structures also share similar biological function. Our tool P3S allows similarity search in protein structure databases. Currently it employs only the SProt similarity measure developed at Siret research group. However, support for other protein similarity measures is planned, for example the SABERTOOTH measure developed by AG Porto at the Universität zu Köln.

rPredictor (Infrastructure consisting of tools, a database and web interface that together enable predicting ribosomal RNA secondary structure and analysing it.)
rPredictor is a bioinformatical infrastructure consisting of tools, a database and a web interface that together enable predicting ribosomal RNA secondary structure and analysing it. The predicted structures, their analyses and further details about the rRNA molecules are accessible through this website.
The aim of rPredictor is to develop and deploy a technique of predicting ribosomal RNA secondary structure and make the resulting structural information readily available. At the same time, the rPredictor database contains rich annotations of rRNA structures and the underlying sequences.
The project is being developed at the Faculty of Mathematics and Physics, Charles University in Prague in close cooperation with the bioinformatics laboratory of Microbiology Institute AVČR.
David Hoksza
SETTER (RNA structure similarity search)

SETTER web-server utilizes SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) method for fast and accurate structural pairwise alignment. The server is capable of comparing a pair of RNA structures or using one strucutre as a query and search against a user-defined database of RNA structures. The efficiency of the algorithm is given by the decomposition of the RNA structure into the set of non-overlapping generalized secondary structure motifs (GSSUs). GSSU usually resembles a hairpin motif possibly containing bulges and/or internal loops in its stem part. A segmentation to GSSUs offers good scalability with respect to the structure size (SETTER scales linearly with the structure size) because the number of residues in GSSUs (SETTER scales quadratically with the GSSU size) generally does not increase with increased size of the RNA structure. The underlying SETTER algorithm is both accurate and very fast, and does not impose limits on the size of aligned RNA structures. SETTER is able to compare a pair of even the largest RNA structure in less than one minute.

David Hoksza
ViFrame (Chemical space visualization framework)


ViFrame is a modular framework targeted at chemical space visualization task. It offers a simple implementation of every single part of the visualization pipeline consisting of steps such as reading and merging molecules from multiple data sources, applying transformations and finally visualization of the data set in 2D space. The framework also incorporates an application that provides the user with graphical
interface for modules manipulation and presentation of the visualization results. For simple utilization of the application without the necessity of implementation of one’s own module, several visualization methods are implemented and delivered with the project.
David Hoksza, Petr Škoda

General Indexing

PGRTree (Plugin for Indexing Multidimensional Data in PostgreSQL Using R-tree)

In commercial database platforms, the standard search over multiple attributes is provided by B+-tree (or it’s variants) with compound keys. On the other hand, such systems provide also multidimensional indexing, however, just for spatial purposes (such as GIS or CAD applications) and use special data types and querying syntax. Our solution allows to apply R-tree index (hence multi-dimensional indexing structure) in the same way as B+-tree with compound keys is applied. The solution is delivered as a plugin for PostgreSQL database.

David Hoksza