Skip to Content

Neo4j database of proteins for protein-protein interaction identification

THE NEW VERSION OF THIS PAGE AFTER THE DATA LOSS

This page contains updated version of the data described in  the paper "Using Neo4j for mining protein graphs: a case study". Sadly, the original data were lost due to a hack of our systems and therefore we recreated them in August 2019. However since the underlying data in the PDB has changed, our DB is marginally different from the one based on which the results in the paper are reported.

  • DB
    • generated using neo4jShell -c dump
    • Statistics
      • 5,579 protein structures translating into 69,078 connected components
      • 14,273,678 nodes
      • 37,779,062 edges
      • DB size (uncompressed) ~ 2,6 GB
      • Neo4j version: 2.2.2
  • Node-edge.zip
    • Queries to test dependency on the number of nodes and edges in the query graph (see fig. 1a, resp. table I in the paper) 
    • {i}.cypher contains queries with graphs at {i} nodes and its {j}-th line contains a query with graph with {i}+{j}-2 edges
    • All test queries for spherical neighborhood layout with distance set to one edge from the central residue (for details, see the paper, pg. 3)
    • Not all of the queries were possible to compute in Neo4J, so be carefull when using it

 

THE ORIGINAL VERSION OF THIS PAGE BEFORE THE DATA LOSS

The process of data extraction from PDB to the Neo4J DB is described in the paper "Using Neo4j for mining protein graphs: a case study" published at the 2nd International Workshop on NoSQL Databases, Emerging Database Technologies and Applications .

  • DB download (258 MB) generated using neo4jShell -c dump

Statistics:

  • 69,200 protein structures translating into about 69,200 connected components
  • 14,285,327 nodes
  • 37,788,722 edges
  • DB size (uncompressed) ~ 3GB
  • Neo4j version: 2.2.0