Skip to Content
GAČR 17-22224S
User Preference Analytics in Multimedia Exploration Models
2017 - 2019
Principal investigator : Tomas Skopal
Team member : Jakub Lokoc, Ladislav Peska
GAČR 15-08916S
Efficient subgraph discovery for petabyte-scale web analysis
2015 - 2017

The study of network behaviors without packet content inspection is becoming of increasing concern in context of network administration and security. Recent years observe an increasing demand for machine learning algorithms on graphs, since modeling interactions between entities by graphs is natural in context of large computer networks. A promising approach to modeling graphs that leverages the advantages of machine learning techniques is based on the so-called ``graphlets'', that provide embedding of graph fragments into vector spaces. However, wider adoption of graphlets is hindered by the cost of embedding and limitation to unweighted undirected graphs. In this project, we would like to focus on the design of generalized graphlet-based models and the respective vocabularies, and thus to increase the variety of applications potentially benefiting from graphlet-based descriptors. The proposed methodology will be verified within the domain of network security. In particular, malicious web communities will be searched on petabyte-scale network traffic database available to Cisco.

Principal investigator : Jakub Lokoc
Team member : Tomas Skopal, Premek Cech
GAUK 910913
Real-time Exploration Queries in Multimedia Databases
2013 - 2015

Nowadays, the similarity search in multimedia databases is performed through similarity queries explicitly specified by users. The queries return a certain part of the database that is relevant to the user specified query parameters. However, this approach suffers in case the user does not know how to specify the query, or actually she/he only wants to know what the database contains in the whole picture. In such case non-standard access to data is more appropriate, e.g., the exploration of a multimedia database.
During the exploration process the user gains a complex idea of all the stored data rather than a particular part of database returned as the result of some similarity query. In the complex view the user is osupported in browsing the space of multimedia data (typically by multi-touch device, provided by modern technologies, e.g., iPad) and that results in stream of similarity queries. For a convenient user-friendly browsing, the exploration system has to evaluate these queries promptly, which is not guaranteed in case of standard query processing (even approximate). Hence, the goal of this project is to propose and implement access methods that provide functionality of real-time similarity retrieval, thus founding fundamentals for user-friendly exploration of multimedia databases.

Principal investigator : Juraj Mosko
Team member : Tomas Skopal, Tomas Bartos, Jakub Lokoc, Tomas Grosup
CISCO 2013
Finding similar events within IDS
2013 - 2014
Principal investigator : Tomas Skopal
Team member : Jakub Lokoc, Juraj Mosko
GAUK 567312
Algorithmic exploration of axiom spaces for efficient similarity search at large scale
2012 - 2014

Similarity search is becoming popular in even more disciplines, such as multimedia databases, bioinformatics, data mining, or social networks. The large-scale search engines for such data are mostly based on models involving low-level features and simple similarity functions. There also exist complex models employing local features and higher-level similarities which provide higher retrieval effectiveness. An application of complex models, however, is not feasible at large scale due to insufficient portfolio of indexing techniques enabling fast search.

 
The existing techniques assume the metric space model that is too restrictive. In this project we revisit assumptions which persist in the mainstream research of content-based retrieval. Leaving the traditional indexing paradigms such as the metric space model, our goal is to propose alternative methods for indexing that shall lead to high-performance similarity search. We intend to develop an algorithmic framework for exploration of axioms (analytical properties) useful for indexing that hold in a given complex similarity space but were not discovered so far. Consequently, the known axioms will be localized as a small subset within the universe of all axioms suitable for indexing. The discovery of new axioms valid in some similarity space might have a huge impact in the database community.
 
Principal investigator : Tomas Bartos
Team member : Tomas Skopal, Juraj Mosko
GAČR P202/11/0968
Large-scale Nonmetric Similarity Search in Complex Domains
2011 - 2014

The similarity search is popular in various areas of computing, including multimedia databases, data mining, bioinformatics, etc. For a long time, the database approaches to similarity search assumed the similarity as a metric distance. Due to its properties, metric similarity allows to index a database such that it can be queried efficiently (quickly). However, together with the increasing complexity of data across various domains, there appeared many similarities in recent years that were not metrics (i.e., nonmetrics). The database research, however, is still not aware of the huge potential market for nonmetric similarity search, recognizing just the metric space model.

            This project aims to propose formal models followed by a design of access methods for efficient nonmetric similarity search, that is, search in databases where the similarity is not restricted by the metric postulates. Such a goal would bring an efficient database solution to the domain experts that need to pursue large-scale content-based retrieval tasks in complex databases, like multimedia retrieval, similarity-based data mining, complex pattern matching, classification and prediction in bioinformatics, etc.

Principal investigator : Tomas Skopal
Team member : David Hoksza, Jakub Lokoc, Jiri Novak, Juraj Mosko, Tomas Bartos
GAUK 430711
Application of Metric and Non-metric Indexing Methods in Computational Proteomics
2011 - 2012

The volume of unstructured databases grows extremely whereas its annotation is problematic. The similarity search concept based on a similarity function defined for each pair of database objects is more suitable for this kind of data. The similarity is usually modelled by a distance function satisfying metric axioms, which allows efficient indexing. However, metric axioms can be very restrictive for domain experts who may prefer non-metric functions. Hence database experts have to solve this problem by converting non-metric functions to metric ones or by developing new types of non-metric indexing methods.

One of the areas where metric/non-metric similarity searching is used is computational proteomics. During the determination of the biological function of an "unknown" protein, retrieval of "known" proteins with similar structures (and thus probably with similar function) is very useful. Moreover, fast and cheap determination of protein structures is also an open problem. From database point of view, it is possible to use databases of known protein structures to address this problem. In this approach, sequence-structure similarity functions are used to obtain structures that can be similar to the searched structure of the protein.

Our goal is developing high-quality structure and sequence-structure similarity functions and methods for their indexing.

Principal investigator : Jakub Galgonek
Team member : Tomas Skopal, Jiri Novak, Jakub Lokoc
GAČR 201/09/0683
Similarity Searching in Very Large Multimedia Databases
2009 - 2011

Finished, rated as excellent

Co-Investigator : Tomas Skopal
GAUK 18208
Distributed and parallel metric indexing in multimedia databases
2008 - 2009

Current data processing applications use data with considerably less structure and much less precise queries than traditional database systems. The multimedia data, like images or videos, that offer query-by-example search, are a typical example. Such data can neither be ordered in a canonical manner nor meaningfully searched by precise database queries that would return exact matches. This novel situation is what has given rise to a similarity searching. The most general approach to the similarity search, still allowing construction of index structures, is modeled in metric space. Here an important issue is the efficiency - we need to achieve fast query response over huge volumes of data. During last two decades there have been developed many metric access methods and indexing structures, however, they mostly cannot scale up with the exponential growth of multimedia data volumes we encounter during last years. A way to compete this enormous growth is to design parallel and distributed solutions, either as an extension of the traditional centralized indexing techniques, or completely new ones, where the parallelism/distribution are inherent indexing properties. Hence, the goal of the proposed project is the design and implementation of parallel and distributed indexing techniques and comparison with existing centralized solutions.

Principal investigator : Jakub Lokoc
Team member : Tomas Skopal
GAUK 57907
Similarity search in biological databases
2007 - 2008

In recent years volume of gene and protein banks (databases) grows rapidly. The reason for storing huge volumes of gene and protein sequences in one place is not only for browsing these sequences itself, but in the first place searching for similarities among stored sequences. Similar sequences indicate similar functionality which helps in finding functions of unknown genes.

Current techniques for finding similarity among data sequences go through whole databases of genes and proteins, and examine similarity between query and every sequence in the database. As the volume of databases grows, the time for finding similar seqences increases linearly.

Hence, the goal of the project is an application of multimedia indexing methods to speed up searching in biological databases (primarily genom and protein databases). In the project, we will examine (primarily in the first year - plan of future works is in further sections) the ability of existing indexing methods to index different types of biological data (or their modification) in a way that will be optimal for biological data.

Principal investigator : David Hoksza
Team member : Tomas Skopal
GAČR 201/05/P036
Efficient metric search in large multimedia databases
2005 - 2007

Finished, rated as excellent

Principal investigator : Tomas Skopal