Skip to Content

Bioinformatics & Cheminformatics

Molpher (Software tool for exploration of the chemical space)

Molpher aims to be a scalable and interactive software tool to aid exploration of chemical space, the vast universe containing all possible compounds. Many areas of chemical biology, such as drug discovery, rely heavily on chemical libraries offering compounds usable in the industrial processes. Given a set of molecules with desired characteristics, Molpher explores their common neighbourhood based on structural similarity, as it represents promising part of the chemical space to find new additions into those libraries. In order to decrease the chance of missing interesting parts of the space, Molpher offers the human researcher to observe and interactively alter the exploration process. Generated subspace is expected to be further tested for synthesizability and biological activity by other software tools.

Among main features, Molpher offers optimized parallel exploration algorithm, compound logging, dimension-reduced visualization of chemical space and interactive widget-based GUI. Codebase is extensible in terms of additional morphing operators, chemical fingerprints, similarity measures and visualization strategies to allow further experiments.

David Hoksza, Petr Škoda
P2RANK (Ligand-binding site prediction)
 
P2RANK is a novel machine learning-based method for prediction of ligand binding sites from protein structure. P2RANK uses Random Forests classifier to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. P2RANK is freely available at http://siret.ms.mff.cuni.cz/p2rank.
 
 
P3S (Protein structure similarity search)

Proteins can perform many different biological functions and so ensure most of the vital processes in the living organisms. From a chemical point of view, proteins are linear polymer chains made of only twenty kinds of amino acids. However, they can fold into various 3D structures resulting in huge functional diversity. The study of protein functions is one of the areas where similarity search in protein structure databases is widely used. The usage follows from the thesis that proteins having similar 3D structures also share similar biological function. Our tool P3S allows similarity search in protein structure databases. Currently it employs only the SProt similarity measure developed at Siret research group. However, support for other protein similarity measures is planned, for example the SABERTOOTH measure developed by AG Porto at the Universität zu Köln.

rPredictor (Infrastructure consisting of tools, a database and web interface that together enable predicting ribosomal RNA secondary structure and analysing it.)
rPredictor is a bioinformatical infrastructure consisting of tools, a database and a web interface that together enable predicting ribosomal RNA secondary structure and analysing it. The predicted structures, their analyses and further details about the rRNA molecules are accessible through this website.
The aim of rPredictor is to develop and deploy a technique of predicting ribosomal RNA secondary structure and make the resulting structural information readily available. At the same time, the rPredictor database contains rich annotations of rRNA structures and the underlying sequences.
The project is being developed at the Faculty of Mathematics and Physics, Charles University in Prague in close cooperation with the bioinformatics laboratory of Microbiology Institute AVČR.
David Hoksza
SETTER (RNA structure similarity search)

SETTER web-server utilizes SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) method for fast and accurate structural pairwise alignment. The server is capable of comparing a pair of RNA structures or using one strucutre as a query and search against a user-defined database of RNA structures. The efficiency of the algorithm is given by the decomposition of the RNA structure into the set of non-overlapping generalized secondary structure motifs (GSSUs). GSSU usually resembles a hairpin motif possibly containing bulges and/or internal loops in its stem part. A segmentation to GSSUs offers good scalability with respect to the structure size (SETTER scales linearly with the structure size) because the number of residues in GSSUs (SETTER scales quadratically with the GSSU size) generally does not increase with increased size of the RNA structure. The underlying SETTER algorithm is both accurate and very fast, and does not impose limits on the size of aligned RNA structures. SETTER is able to compare a pair of even the largest RNA structure in less than one minute.

David Hoksza
Simtandem (Protein sequence identification)

SimTandem is a tool for protein or peptide sequences identification from tandem mass spectra. The identification is based on the similarity search in databases of already known or predicted protein sequences. Since the size of sequence databases grows rapidly, metric access methods are employed for database indexes. SimTandem implements a previously proposed method, where the M-tree and the TriGen algorithm were used for fast and approximative (i.e., non-metric) search. The recently introduced parameterized Hausdorff distance, which is suitable as a coarse filter for metric indexes, is utilized. SimTandem supports the search of mass spectra with posttranslational modifications, which are quite common problem when mass spectra are interpreted. SimTandem has been implemented as both the on-line web tool and the stand-alone application. SimTandem is freely available at http://www.simtandem.org or http://www.siret.cz/simtandem.

Jiri Novak, Tomas Skopal
ViFrame (Chemical space visualization framework)

 

ViFrame is a modular framework targeted at chemical space visualization task. It offers a simple implementation of every single part of the visualization pipeline consisting of steps such as reading and merging molecules from multiple data sources, applying transformations and finally visualization of the data set in 2D space. The framework also incorporates an application that provides the user with graphical
interface for modules manipulation and presentation of the visualization results. For simple utilization of the application without the necessity of implementation of one’s own module, several visualization methods are implemented and delivered with the project.
David Hoksza, Petr Škoda

General Indexing

PGRTree (Plugin for Indexing Multidimensional Data in PostgreSQL Using R-tree)

In commercial database platforms, the standard search over multiple attributes is provided by B+-tree (or it’s variants) with compound keys. On the other hand, such systems provide also multidimensional indexing, however, just for spatial purposes (such as GIS or CAD applications) and use special data types and querying syntax. Our solution allows to apply R-tree index (hence multi-dimensional indexing structure) in the same way as B+-tree with compound keys is applied. The solution is delivered as a plugin for PostgreSQL database.

David Hoksza

Multimedia

Exploration Portal (Image exploration demo)

Exploration portal is a demonstration application for Multimedia exploration framework and a logical successor of SIR. The portal uses all framework features and implements example use cases for all important framework parts.

The portal can be used for exploration of static data sets, Bing search results and personal Facebook albums. It provides a variety of configuration options to affect feature extraction, similarity model, index creation and many other parameters.

The result is visualized using a similarity-based layout and supports different query options, such as zoom-in, zoom-out, multi-query or panning in 4 different directions.

Find the image (Online tool for comparisons of different multimedia exploration approaches)

Find the image is an artificial search scenario designed for testing and comparison of our exploration techniques. The task is to use a web-based exploration application to find as much images from a predetermined class as possible. This predetermined class should correspond to a search intention that cannot be easily transformed to a text-based query or to a query-by-example.

Multimedia exploration framework (Creation of efficient multimedia exploration applications)

Multimedia exploration framework is an extensible solution for creation of multimedia exploration applications.

It uses a modular architecture and already provides several implementantions for every component. Besides managing the software architecture and data flow, the framework also takes care of data source management, data retrieval, feature extraction, distance computation, metric indexing, query execution, data visiualization and GUI creation.

Contact the developers directly if you are interested in building an application using our framework.

SIR (Smart image retrieval)

When determining visual similarity of two images, it is evaluated on feature representations which consist of some content-based image properties. The conventional feature representations aggregate and store these properties in global feature histograms (e.g.,
MPEG-7 visual descriptors).

Recent feature representations, however, adaptively aggregate local image features in more flexible feature signatures, which can be
compared by adaptive similarity measures. The SIR engine developed at SIRET research group combines traditional MPEG-7 visual descriptors with feature signatures, leading to improved similarity search in image collections.

Currently, the SIR engine operates in a demo mode as a standalone image search engine. In order to manage large image collections in real time, the engine employs original database indexing technology. The SIR engine also includes meta-search functionality that allows to augment/rerank/explore results provided by other image search engines, such as Google Images and others. The actual version of the online re-ranking and exploration tool employes the particle physics model, that both distributes images on the screen and automatically creates visually similar clusters (as a side effect). To refer this tool, you can refer our publications - Image Exploration using Online Feature Extraction and Reranking (ICMR, 2012) and SIR: The Smart Image Retrieval Engine (SISAP, 2012).

Jakub Lokoc, Tomas Skopal
Sketch-based Video Browser (or Video Hunter) (A video retrieval tool for known-item search tasks.)

Sketch-based Video Browser (or Video Hunter) is a new tool focusing on known-item search tasks, where users have seen some video scene, know it is contained in a collection, but do not know where it is located. Therefore, users have to search/browse the collection with advanced techniques enabling query idea initialization, result visualization and browsing. Given such support, known-item search is not restricted just to ideal query formulation. The Video Hunter tool has participated in the Video Browser Showdown, winning the competition in 2014 and 2015.

Web Image Extractor (Image feature signatures extractor demo implemented in web browser)

A demo which presents a feature extraction method that captures color and texture information from an image and produces adaptive signatures for similarity search models, where distances like SQFD or EMD can be used. The method and its parallel implementation for GPUs is presented in our publication (listed below). We are currently transforming the code, it can be used as OpenCV module. 

The demo is also a proof of concept that goes against the current trends in web applications. We propose to offload computations from the servers (or cloud) to end users by performing computationally demanding tasks in the browser. In this case, we claim that in a web application that collects the images from the users, the feature extraction process can be performed by the browser while the image is being uploaded.

Martin Krulis