GAUK 57907
Similarity search in biological databases
2007 - 2008

In recent years volume of gene and protein banks (databases) grows rapidly. The reason for storing huge volumes of gene and protein sequences in one place is not only for browsing these sequences itself, but in the first place searching for similarities among stored sequences. Similar sequences indicate similar functionality which helps in finding functions of unknown genes.

Current techniques for finding similarity among data sequences go through whole databases of genes and proteins, and examine similarity between query and every sequence in the database. As the volume of databases grows, the time for finding similar seqences increases linearly.

Hence, the goal of the project is an application of multimedia indexing methods to speed up searching in biological databases (primarily genom and protein databases). In the project, we will examine (primarily in the first year - plan of future works is in further sections) the ability of existing indexing methods to index different types of biological data (or their modification) in a way that will be optimal for biological data.

Principal investigator : David Hoksza
Team member : Tomas Skopal