Neo4j database of proteins for protein-protein interaction identification

THE NEW VERSION OF THIS PAGE AFTER THE DATA LOSS

This page contains updated version of the data described in the paper "Using Neo4j for mining protein graphs: a case study". Sadly, the original data were lost due to a hack of our systems and therefore we recreated them in August 2019. However since the underlying data in the PDB has changed, our DB is marginally different from the one based on which the results in the paper are reported.

DB
- generated using neo4jShell -c dump
- Statistics
  - 5,579 protein structures translating into 69,078 connected components
  - 14,273,678 nodes
  - 37,779,062 edges
  - DB size (uncompressed) ~ 2,6 GB
  - Neo4j version: 2.2.2

Node-edge.zip
- Queries to test dependency on the number of nodes and edges in the query graph (see fig. 1a, resp. table I in the paper)
- {i}.cypher contains queries with graphs at {i} nodes and its {j}-th line contains a query with graph with {i}+{j}-2 edges

Benchmark.zip
- All test queries for spherical neighborhood layout with distance set to one edge from the central residue (for details, see the paper, pg. 3)
- Not all of the queries were possible to compute in Neo4J, so be carefull when using it

THE ORIGINAL VERSION OF THIS PAGE BEFORE THE DATA LOSS

The process of data extraction from PDB to the Neo4J DB is described in the paper "Using Neo4j for mining protein graphs: a case study" published at the 2nd International Workshop on NoSQL Databases, Emerging Database Technologies and Applications .

DB download (258 MB) generated using neo4jShell -c dump

Statistics:

69,200 protein structures translating into about 69,200 connected components
14,285,327 nodes
37,788,722 edges
DB size (uncompressed) ~ 3GB
Neo4j version: 2.2.0

Several Cypher queries and corresponding execution plans

email:	info@siret.cz
phone:	+420 95155 4227

Neo4j database of proteins for protein-protein interaction identification

Contact

News