MATTHEWS_COEF (CCP4: Supported Program)

NAME

matthews_coef - Misha Isupov's Jiffy to calculate Matthews coefficient.

SYNOPSIS

DESCRIPTION

The Matthews Coefficient and solvent content are calculated from the unit cell and the molecular weight of the molecules in the unit cell. A description of the Matthews coefficient Vm and how it relates to solvent content is given below.

The probabilities from the Matthews coefficient paper of Kantardjieff and Rupp are also printed, P(reso) for the probability using the input high resolution limit and P(tot) for the probability across all resolution ranges. This gives the probability of a particular Matthews coefficient based upon the high resolution limit.

Input:

The program requires the information below which is input via keywords. No input files are required.

cell parameters
the spacegroup i.e. number of symmetry operations
the number of molecules in the asymmetric unit
the molecular weight of 1 molecule ( ~ number of residues * 110), or the number of residues in 1 molecule
the high resolution limit (optional)

Output:

No output files are generated; below is a sample of the log output.

  THE MATTHEWS COEF. IS :  1.74
  SOL % IS : 28.96

or, if used with the AUTO keyword:

or given protein molecular weight,  or estimated from number of residues 110 
Nmol/asym  Matthews Coeff  %solvent       P(tot)
 
  1         3.98            69.14         0.09 
  2         1.99            38.28         0.90 
  3         1.33             7.42         0.01

Using both the AUTO keyword and the RESO limit (example at resolution 5.0):

For given protein molecular weight,  or estimated from number of residues 110 
Nmol/asym  Matthews Coeff  %solvent       P(5.00)     P(tot)
 
  1         3.98            69.14         0.19         0.09 
  2         1.99            38.28         0.80         0.90 
  3         1.33             7.42         0.01         0.01

The highest P(tot) is a strong indicator of the preferred solution.

KEYWORDED INPUT

Available keywords are:

AUTO, CELL, MODE, MOLWEIGHT, NMOL, NRES, RESO, SYMMETRY, XMLOUTPUT

Compulsory keywords

CELL a b c [alpha beta gamma]

You must give the unit cell parameters. The angles default to 90.0 if omitted.

SYMMETRY

Either the spacegroup number or name can be given. Alternatively, the symmetry operators can be input explicitly, each separated with a '*'. However, the program only requires the total number of operators.

Extra keywords

NRES <number_of_residues>

This is used to estimate the molecular weight of one molecule in Daltons. It is assumed that on average each protein residue contains 5 carbons, 1.35 nitrogens, 1.5 oxygen, 8 hydrogen and 0.05 sulphur atoms, and thus has a molecular weight of 112.5 Da. It is assumed that each DNA residue has an average weight of 325.96 Da. The average weight for a DNA/protein complex is calculated assuming a ratio of 0.25/0.75.

Obviously, these estimates are very approximate, and it is better to input the real molecular weight.

MOLWEIGHT <molecular_weight>

The molecular weight of a molecule in Daltons. What is important is the total molecular weight of the molecules in the asymmetric unit. This keyword is used in conjunction with NMOL. If this is not given, the program calculates a tentative molecular weight of the molecule, assuming the unit cell is 47% solvent in the case of protein, 64% for dna and 60% for a protein/dna complex.

NMOL <number>

This keyword is not compulsory but is used in conjunction with MOLWEIGHT. The <number> of molecules per asymmetric unit. Default 1.

AUTO

This keyword is not compulsory and can be used in conjunction with NMOL and MOLWEIGHT. It produces a list of incrementing number of molecules, from NMOL (default 1), in the asymmetric unit whilst the %solvent is >0.0.

RESO <high_resolution_limit>

This keyword is not compulsory. The high resolution limit is used in the probability scoring.

MODE <Dna/Comp/other>

Non-compulsory keyword. Mode C indicates that a dna/protein complex is to be modelled, and mode D indicates that dna is to be modelled. Other input leads to the calculation with protein only. The default is protein only.

XMLOUTPUT

This keyword is of little use for the 'user'. When specified matthews_coef will output a small XML file of the results. The name and location of the XML file can be specified on the command line with XMLFILE, otherwise the file will be called MATTHEWS_COEF.xml.

Example of input

CELL 73.58 38.73 23.19
SYMM 19
MOLW 6600.0
AUTO
XMLO

Example of output file

<?xml version="1.0"?>
 <matthews_run>>
  <MATTHEWS_COEF
    ccp4_version="4.1" 
    date=" 1/25/02" 
   />
  <keyword
  >
  
  </keyword>
  <cell
    volume="   66085.78" 
   />
  <result
    nmol_in_asu="           1" 
    matth_coef="   2.503249" 
    percent_solvent="   50.89439"
    prob_matth="  0.9962950"
   />
  <result
    nmol_in_asu="           2" 
    matth_coef="   1.251625" 
    percent_solvent="   1.788778"
    prob_matth="   3.564707E-03"
   />
 </matthews_run>

PROGRAM FUNCTION

Matthews Number

Vm =   cell volume ( cubic As)      V
       -----------------------   = ---  
           M*nasymu*nmols_asu      M*Z 

         M         =   molecular weight of protein in daltons 
         V         =   volume of unit cell.
         Z         =   no. of molecules in unit cell. = nasymu*nmols_asu
         nasymu    =   number of asymm. units                 
         nmols_asu =   number of molecules in asym unit.


Molecular weight

          = number of protein residues in molecule * 110
                                              - very roughly!!!
          = number of non hydrogen protein atoms in molecule *14 
                                              - roughly!!!!

Use RWCONTENTS to read your PDB file if you have one; it will count number of atoms of each type. Alternatively, GET_PROT can be used to calculate the Molecular Weight from a sequence. Note that while the Matthews coefficient is not very sensitive to the Molecular Weight, the probabilities from the Matthews coefficient paper of Kantardjieff and Rupp can be, especially if there are several molecules in the asymmetric unit.

Matthews found Vm somewhere between 1.66+ and 4.0+ corresponding to protein contents of 75% to 30% but proteins with higher solvent contents will give higher values of Vm. E.g. for a solvent content of 90%, the Vm would be 12+.

Using this you can calculate Vm assuming nmols_asu = 1/4,1/2,1,2,3 etc etc.. You MAY be able to narrow down the number of possibilities for nmols_asu. If Vm falls outside the range above then the number of molecules per asymmetric unit assumed, is likely to be incorrect.

Protein fraction

Turning this into fraction of protein in asymmetric unit:

            Total mass of Protein in unit cell
Vp  =    ---------------------------------------
          Protein density   *  Unit cell volume


Vp  = M*Z*u/(V*Dp)    = 1/(N*Dp*Vm) 
 
where   Vp = fraction of protein volume in asymmetric unit.
        Vm = Matthews Number        (A**3/Daltons)
        Dp = density of protein = 1.35  (g/cc)   (ref 1)
        N  = Avogadro constant  = 6.023*10**23  gmole**(-1)
        u  = Mass of Hydrogen   = 1.66*10**-24  g
  
( It is sufficient to approximate the mass of a Hydrogen atom as 
(1/N) because the mass of 1 mole of Hydrogen approximates to 1g.)

==>From this it is easy to obtain the formula derived in Matthews i.e.

                       Vp = 1.66*v / Vm  
                          = 1.23 / Vm
                            1/Dp is Matthew's v = 0.74 cc/g )

Alternatively:

 
Vp  = Np* AV/V

where   Np  = number of protein atoms in unit cell 
             (including hydrogens)
        AV  = average atomic volume in A**3 - = 10 approximately.

             (There are about the same number of hydrogens 
              as C N O etc.)


If Vp equals fraction of protein volume  in asymmetric 
unit

Density  =    Dp *Vp  +   Ds* (1-Vp)
         =   1.35*Vp  + 1.0 * (1-Vp)
         =   0.35*Vp  + 1.0

Ds = density of solvent.  = 1.0 for H₂O 
             
therefore    Vp  =   (density -1.0)/0.35

If you know the density you can work backwards and find the number of molecules in the asymmetric unit exactly.

EXAMPLES

matthews_coef << eof
CELL 73.58 38.73 23.19
symm 19
molweight 6600.0
nmol 1
eof

With keyword 'AUTO'

matthews_coef << eof
CELL 73.58 38.73 23.19
SYMM 19
MOLW 6600.0
AUTO
eof

AUTHORS

Originator: Misha Isupov
Additions by: Charles Ballard ccb@dl.ac.uk, Alun Ashton a.w.ashton@ccp4.ac.uk, Eleanor Dodson ccp4@ysbl.york.ac.uk

REFERENCES

Matthews, J.Mol.Biol 33, 491-497 (1968).
Kantardjieff and Rupp, Protein Science 12, 1865-1871 (2003).