Skip to Content

P2RANK

Type: 
Bioinformatics & Cheminformatics
One line description: 
Ligand-binding site prediction
Annotation: 
 
P2RANK is a novel machine learning-based method for prediction of ligand binding sites from protein structure. P2RANK uses Random Forests classifier to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. P2RANK is freely available at http://siret.ms.mff.cuni.cz/p2rank.
 
 
Developers: 
david.hoksza
radoslav.krivak

Introduction

 
P2RANK is a novel machine learning-based method for prediction of ligand binding sites from protein structure. P2RANK uses Random Forests classifier to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. 
 
 

 

Download

p2rank_2.0_rc.2.zip (100MB), requires Java 1.8 or newer

 

Source 

Source code is available at https://github.com/rdk/p2rank

 


 

Usage

uasge:

   prank <command> <dataset.ds> [options]

commands:

   predict      ... predict pockets (P2RANK)

   eval-predict ... evaluate model on a dataset with known ligands

   rescore      ... rescore previously detected pockets (PRANK)

   eval-rescore ... evaluate rescoring model on a dataset with known ligands

datasets:

      Dataset files for prediction should contain list of pdb files.
      Dataset files for rescoring should contain list of protein files
      that are outputs of one of the supported pocket prediction methods
      (fpocket, ConCavity). In datasets for evaluation and training they
      must be paired with liganated-proteins (correct solutions).
      See example datasets in test_data/ directory.

options:

   -f <path>   run on single pdb file instead of a dataset

   -c <path>   use configuration file that overrides default configuration
               in config/default.groovy, path relative to config/ directory

   -m <path>   use previously trained classifier file relative to models/ directory
               default: models/default.model

   -o <path>   specify output directory (relative to working dir)
               default: test_output/<comamnd>_<dataset>

other parameters:

   -threads <int>         number of execution threads
                          dafault: num. of processors + 1

   -visualizations <0/1>  produce PyMOL visualizations
                          default: true

   -<param> <value>       for full list of parameters see config/default.groovy

 

More usage examples

#
# Print help:
#

prank help

#
# Ligand binding site prediction (P2RANK algorithm):
#

prank predict test.ds                                  # run on whole dataset (list of pdb files)
prank predict -f test_data/liganated/1aaxa.pdb         # run on individual pdb file
prank predict test.ds     -o output_here               # explicitly specify output directory

prank predict -threads 8          test.ds              # specify no. of working threads for parallel processing
prank predict -c predict2.groovy  test.ds              # specify configuration file (predict2.groovy uses different
                                                         prediction model and combination of parameters)

#
# Evaluate model for pocket prediction:
#

prank eval-predict test.ds
prank eval-predict -f test_data/liganated/1aaxa.pdb

#
# Prediction output notes:
#
#   For each file in the dataset program produces a CSV file in the output directory named 
#   <pdb_file_name>_predictions.csv, which contains an ordered list of predicted pockets, their scores, coordinates 
#   of their centroids and list of PDBSerials of adjacent amino acids and solvent exposed atoms.
#
#   If coordinates connolly points that belong to individual pockets are needed they can be found
#   in viualisations/data/<pdb_file_name>_points.pdb. There "Residue sequence number" (23-26) of HETATM record 
#   cocrresponds to the rank of corresponding pocket (points with value 0 do not belong to any pocket).
#

#
# Rescore pocket detected by other methods (PRANK algorithm):
#

prank rescore test_data/fpocket.ds
prank rescore fpocket.ds                 # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir   # test_output/ is default 'output_base_dir'

#
# Override default params with custom config file:
#

prank rescore -c config/example.groovy test_data/fpocket.ds
prank rescore -c example.groovy        fpocket.ds


#
# It is also possible to override default params on a command line with their full name:
# (to see complete list of params look into config/default.groovy)
#

prank rescore                   -seed 151 -threads 8  test_data/fpocket.ds
prank rescore -c example.groovy -seed 151 -threads 8  test_data/fpocket.ds

#
# Evaluate model for pocket rescoring:
#

prank eval-rescore                        fpocket-pairs.ds
prank eval-rescore -m model/default.model fpocket-pairs.ds
prank eval-rescore -m default.model       fpocket-pairs.ds
prank eval-rescore -m other.model         fpocket-pairs.ds
prank eval-rescore -m other.model         fpocket-pairs.ds -o output_dir