A small library of common 9 residue protein fragments identified by cluster analysis of a large representative subset of the PDB chosen using the FSSP sequence homology database.
The fragment library contains `representative' search models taken from structures in the PDB. Maximum likelihood search targets are also provided for 9-residue helices at various resolutions. The representative fragments were selected by performing cluster analysis of all possible fragments in a representative subset of the PDB, chosen using the FSSP sequence homology database taking into consideration the structure determination method and resolution. The clustering was performed on the basis of the values of the eigenparameters of the CA distance matrix elements. The all-atom fragments within each cluster were then subjected to cluster analysis to identify the densest subcluster, from which a representative fragment was selected. The empirical fragments are therefore representative rather than average structures.
The maximum likelihood targets are suitable for the location of fragments in maps at lower resolutions and in poorly phased maps (e.g. SIR/SAD). Maximum likelihood targets are provided for a 9 residue helical fragment at resolutions from 4.0 to 8.0 Angstroms. The files are as follows:
resolution | file |
---|---|
4.0A | ml-helix-9-4.0.max |
5.0A | ml-helix-9-5.0.max |
6.0A | ml-helix-9-6.0.max |
7.0A | ml-helix-9-7.0.max |
8.0A | ml-helix-9-8.0.max |
There is also a model, ml-helix-9.pdb, which is an average coordinate model from the same set of fragments from which the likelihood targets were devised. This model may be supplied on XYZIN to provide a file of output fragments for visualisation or use in ffjoin.
Note: These files are standard CCP4 maps with both the mean and standard deviation of the density packed into a single number according to the following formula: map=0.001*(float(nint(1000.0*mean))+stddev) i.e. the mean density is truncated to 3 decimal places, and the standard deviation, which must be less than 1, is divided by 1000 and added to it. Software for this purpose is available from the author.
All fragments are truncated to poly-ALA, except for the turns which are poly-GLY, since most turns depend on a GLY residue.
The following empirical fragments are included in release 1.2:
The following theoretical fragments were provided with earlier releases of fffear:
The frequencies of the empirical fragments in the database subset are as follows:
Fragment type | Frequency | Frequency (exc. overlaps) |
---|---|---|
emp-helix-9 | 5074 | 854 |
emp-strand-9 | 775 | 495 |
emp-turn_*-9 | 101 | 100 |
emp-helixend-9 | 397 | 397 |
Whole database | 11068 | n/a |
Extended helices and strands give multiple matches at 1-residue displacements along the chain. The frequency excluding overlaps gives the size of the maximal non-overlapped set, which is probably more helpful for most purposes.
Kevin D. Cowtan, Department of Chemistry, University of York
email: cowtan@ysbl.york.ac.uk