RAPPER – conformer modelling through sampling residue specific phi / psi propensity tables and rotomeric states given a set of restraints.
USAGE | Doc for RAPPER in CCP4i | INPUT / OUTPUT FILE | KEYWORDS | EXAMPLES | TIPS | REFERENCES
rapper params.xml Mode Keyword_input
Rapper is a program which generates protein conformers by sampling residue specific phi / psi propensity tables and rotomer libraries using a set of constraints (i.e. ideal bond angles and lengths) and given restraints. Restraints include positional restraints including those based on atom positions, framework restraints of known structure, secondary structure and experimental data (specifically electron density from X-ray crystallography) [References 1-9]. Please note this program was developed outside of the CCP4 framework and thus has various non CCP4 standard features.
The generalised nature of the sampling and restraint generation has allowed RAPPER to be applied to a number of conformer modelling problems. These include
ab initio loop modelling
C-alpha tracing
Conformer fitting to electron density (both high and low resolution)
Comparative modelling
As well as this there are a number of utility modes. These include joining PDB files together, RMSD calculations, structure superimposition, etc. Documentation on these modes can be found on the main RAPPER website (http://mordred.bioc.cam.ac.uk/~rapper). The documentation below refers to the modes of operation as implemented in the CCP4i interface.
The modes of operation implemented in CCP4i are:
loop modelling – This is to generate a short section of structure that is currently not modelled. This can be done both with and without using electron density as a restraint.
loop modelling from PDB – This is to re-generate a short section of structure that is currently modelled and present in the input PDB file. This DOES NOT use any positional information from the current section to be re-modelled but just takes the sequence information from the file. This can be done both with and without using electron density as a restraint.
Ca-trace – This generates the entire structure using the C-alpha positions and points to sample around. The sequence is taken from that defined in the input PDB file. Either a short section or the entire structure can be generated. This can be done both with and without using electron density as a restraint.
Rebuild bad fitting residues – This assesses the quality of fit of the current model to a map and identifies regions that should be rebuilt. These regions are rebuilt using the sequence as given in the input PDB file and the input map. The map can be any type (Fo-Fc, 2Fo-Fc, etc.), though a Sigma-A weighted OMIT map (as can be generated by CNS) has been shown to work well.
If building a small section(s) the fragment(s) generated can be integrated back into the rest of model by RAPPER using
rapper params.xml joinpdb --pdb1 fragment.pdb --pdb2 framework.pdb --pdb-out out.pdb
Note these are given as keyword inputs.
Input PDB file using standard PDB file format. If model is fragmented and using C-alpha trace then each fragment should be defined with a separate chain ID as otherwise RAPPER will try to join the fragments as it does not consider residue numbering.
This can be any type of map, though Sigma-A weighted OMIT maps have been shown to work well. Both CCP4 and CNS maps are supported.
Output PDB file and only used for joining modelled fragments back to the rest of the structure. By default a number of files are generated:
loop.pdb / trace.pdb / looptest-best.pdb - depending on the mode one of these files will be produced which contains the first model generated.
multiloop.pdb / multitrace.pdb / looptest.pdb - depending on the mode one of these files will be produced which contains all the models requested.
native.pdb - the input PDB file.
framework.pdb - the input PDB file with the section(s) being rebuilt removed.
models.dat - some statistical information about the models compared to the input structure (if present).
benchmark.dat - some statistical information about the models compared to the input structure (if present) and runtime information.
run-parameters.xml - an XML formatted file of all the parameters and the values used for modelling (extensive).
These files are generated automatically in a new subdirectory of the current working directory called TESTRUNS. If you want to direct the output to a specific folder then use the --runs-dir keyword (see below). If you want to tag the beginning of each file with a specific name then use the --use-CCP4i-file-name and -ccp4i-file-name keywords (see below).
The directory to place the output files.
Use a specific file tag to prepend to the output files.
The tag name that will be prepended to the output files. To be used with --use-CCP4i-file-name keyword
RAPPER has a large number of keyword controls. Just those essential for running the modes of operation available in the CCP4i interface are given below. A full list of keywords can be found by calling:
rapper params.xml help
or by scrutinising the params.xml file. Note that logical values are given as 'true' or 'false'. Often restraints can be switched on using a logical control which then will take default values. If default values wish to be altered both the restraint has to be switched on AND the value altered; just altering the values will not automatically turn on the restraint.
All input values are checked for validity of type and quality. Also a spell check is conducted on keyword commands and will suggested the nearest by similarity command if the spelling does not match any in the keyword database. All keywords are denoted by a double dash '--'.
Residue number to start building from.
Residue number to stop building.
Chain ID of section to be built. If all chains to be built use '*'.
Number of models to be built.
Resolution of map in Angstroms. To be used in conjunction with map and edm-fit.
Use electron density map as a restraint.
Use C-alpha atoms as positional restraints.
Size of restraint sphere to be sampled around the C-alpha atom position.
Required to build side chains. If side chains are required to be built use 'smart'.
The factor by which we reduce the radii of hard-sphere excluded volume interactions when at least one atom is from a side chain. That is, if this parameter is 0.5, then a side chain atom can approach up to twice as close to any other atom as normal. An appropriate value to use is 0.75.
Use virtual side chain centroid to be sampled around.
Size of restraint sphere to be sampled around the virtual side chain centroid position.
The rotomer library to be used. A number of files are distributed with RAPPER in the data directory. If the data directory is not in the RAPPER-DIR default path then the RAPPER-DIR should be set to point to the new location.
The location of the RAPPER root installation.
To rebuild regions of poor fit to an electron density map. Residues to be rebuilt are identified using a real space scoring function, the cut off for which is set using --edm-poor-region-threshold.
Regions with fits worse than this number of standard deviations are considered 'poor'. Typical value is 0.80.
If a region fits poorly, the entire region plus this number of residues on either side are flagged for rebuilding.
Divide the sequence randomly into fragments to take each fragment randomly to build. Allows for optimised time spent sampling regions with rare phi / psi states.
Allow bands to cross chain breaks.
The b-factor assigned to the newly built main chain region.
The b-factor assigned to the newly built side chain region.
Take the b-factors from the section that is being rebuilt.
If true, then if no models can be found during conformational search, an error will be signalled. RAPPER doesn't really care if it can't find anything, and if false, then empty PDB files will be generated.
We attempt to fix mislabelled atoms when reading PDB files.
Copy the remark lines from the input PDB file into the model output file.
Filter the models to provide an enriched solution set. This is computationally very expensive so only use if you have a relatively good map and some cpu time to spare.
Turn off restraints in the 11th plus pass. This is dangerous to use and should be used with care as it will produce models that will violate both restraints and constraints. In particular clash restraints are often violated leading to all sorts of weirdness, but will usually get a model to be generated.
If true, then the 0.0 (positive density) mainchain restraint will be made optional. If false, then the main chain will be unconditionally forced to lie in positive density. This is primarily useful when tracing through a structure with regions in very poor (non-existent) density.
If true, then electron density map restraints will be added if a map file is given.
Only main chain atoms in a position with greater standard deviation than this are considered to satisfy the electron density map restraint.
If true, then electron density map restraints will be added if a map file is given.
Only side chain atoms in a position with greater standard deviation than this are considered to satisfy the electron density map restraint.
If true, then RAPPER will be very particular about the quality of the N and C terminal anchor residues. This is primarily useful for loop modelling where the two anchors play a major role in model accuracy.
If true, then restraints are added to ensure high-quality C terminal anchor geometry. This option, when enabled, can be expensive computationally, and may in fact cause the modelling to fail. On the other hand, models produced with this enabled will have better anchor geometries.
If true, then contact filters will be added.
Below are examples for each of the four modes of modelling used in the CCP4i interface:
NOTE: From the command line the params.xml location needs to be given. In the CCP4 distribution this is located in ccp4-x-x.x/share/rapper/params.xml.
Loop modelling with section not currently modelled in input PDB:
rapper params.xml model-loops --pdb "test.pdb" --map "test.map" --chain-id "B" --models "1" --cryst-d-high "1.77" --seq “AAA” --start 705 --stop 708 --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir" --edm-fit true --sidechain-mode smart --sidechain-radius-reduction 0.75 --sidechain-library RAPPER-DIR/data/richardson.lib
rapper params.xml model-loops --pdb "test.pdb" --chain-id "B" --models "1"--seq “AAA” --start 705 --stop 708 --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir" --sidechain-mode smart --sidechain-radius-reduction 0.75 --sidechain-library RAPPER-DIR/data/richardson.lib
rapper params.xml model-loops --pdb "test.pdb" --chain-id "B" --models "1" --seq “AAA” --start 705 --stop 708 --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir"
Loop modelling with section already modelled in input PDB file using electron density:
rapper params.xml model-loops-benchmark --pdb "test.pdb" --map "test.map" --chain-id "B" --models "1" --cryst-d-high "1.77" --start 705 --stop 708 --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir" --edm-fit true --sidechain-mode smart --sidechain-radius-reduction 0.75 --sidechain-library RAPPER-DIR/data/richardson.lib
Tracing a complete model based on the c-alpha positions as restraints:
rapper params.xml ca-trace --pdb "test.pdb" --map "test.map" --chain-id "B" --models "1" --cryst-d-high "1.77" --start 705 --stop 708 --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir" --edm-fit true --enforce-mainchain-restraints true --mainchain-restraint-threshold "2.0" --sidechain-mode smart --sidechain-radius-reduction 0.75 --sidechain-library RAPPER-DIR/data/richardson.lib
rapper params.xml ca-trace --pdb "test.pdb" --map "test.map" --chain-id "*" --models "1" --cryst-d-high "1.77" --use-CCP4i-file-name true --ccp4i-file-name "rapper_1" --runs-dir "test_dir" --edm-fit true --enforce-mainchain-restraints true --mainchain-restraint-threshold "2.0" --sidechain-mode smart --sidechain-radius-reduction 0.75 --sidechain-library RAPPER-DIR/data/richardson.lib
Identify and rebuild residues that fit poorly into an electron density map:
When building loops use the enforce-strict-anchor-geometry and use-contact-filters arguments as true.
If you are unsuccessful at building the first time try moving the start and stop residues of the loop or part ca-trace.
Make sure that the number of residues given in sequence matches the number of residues defined in difference between the start and stop points i.e. the number should equal the difference + 1 as the start residue is included in the loop.
Note that the sequence is case sensitive and uses the single letter code.
If building into density then the model should be refined before assessing it relative to a map.
Loops generated may not 'look' perfect in a viewer but can be easily fixed using the tools in COOT.
P.I.W. de Bakker, M.A. DePristo, D.F. Burke, T.L. Blundell (2002) Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins Struct. Funct. Genet. 51 21-40.
S.C. Lovell, I.W. Davis, W.B. Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, D.C. Richardson (2003) Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins: Struct. Funct. Genet. 50 437-450.
M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell (2002) Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles. Proteins Struct. Funct. Genet. 51 41-55.
M.A. DePristo, P.I.W. de Bakker, R.P. Shetty, T.L. Blundell (2003) Discrete restraint-based protein modeling and the Cα-trace problem. Protein Science 12 2032-2046.
R.P. Shetty, P.I.W. de Bakker, M.A. DePristo, T.L. Blundell (2003) The advantages of fine-grained side chain conformer libraries. Protein Engineering 16 963-969.
M.A. DePristo, P.I.W. de Bakker, T.L. Blundell (2004) Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure (Camb.) 12 831-838.
M.A. Depristo, P.I.W. de Bakker, R.J. Johnson, T. L. Blundell. (2005) Crystallographic refinement by knowledge-based exploration of complex energy landscapes. Structure 13 (9) 1311-1319.
N. Furnham, T. L. Blundell, M.A. Depristo, T. C. Terwilliger. (2006) Is one Solution Good Enough? Nature Structural & Molecular Biology 13 (3) 184-185.
N. Furnham, Andrew S. Dore, Dimitri Y. Chirgadze, Paul I. W. de Bakker, M.A. Depristo, T. L. Blundell. (2006) Knowledge-Based Real-Space Explorations for Low-Resolution Structure Determination Structure 14 (8) 1313-1320.
Nicholas Furnham, Paul de Bakker, Mark DePristo, Reshma Shetty, Swanand Gore and Tom Blundell.