matthews_coef - Misha Isupov's Jiffy to calculate Matthews coefficient.
matthews_coef
[Keyworded input]
The Matthews Coefficient and solvent content are calculated from the unit cell and the molecular weight of the molecules in the unit cell. A description of the Matthews coefficient Vm and how it relates to solvent content is given below.
The probabilities from the Matthews coefficient paper of Kantardjieff and Rupp are also printed, P(reso) for the probability using the input high resolution limit and P(tot) for the probability across all resolution ranges. This gives the probability of a particular Matthews coefficient based upon the high resolution limit.
The program requires the information below which is input via keywords. No input files are required.
No output files are generated; below is a sample of the log output.
THE MATTHEWS COEF. IS : 1.74 SOL % IS : 28.96
or, if used with the AUTO keyword:
or given protein molecular weight, or estimated from number of residues 110 Nmol/asym Matthews Coeff %solvent P(tot) 1 3.98 69.14 0.09 2 1.99 38.28 0.90 3 1.33 7.42 0.01
Using both the AUTO keyword and the RESO limit (example at resolution 5.0):
For given protein molecular weight, or estimated from number of residues 110 Nmol/asym Matthews Coeff %solvent P(5.00) P(tot) 1 3.98 69.14 0.19 0.09 2 1.99 38.28 0.80 0.90 3 1.33 7.42 0.01 0.01
The highest P(tot) is a strong indicator of the preferred solution.
Available keywords are:
AUTO, CELL, MODE, MOLWEIGHT, NMOL, NRES, RESO, SYMMETRY, XMLOUTPUT
You must give the unit cell parameters. The angles default to 90.0 if omitted.
Either the spacegroup number or name can be given. Alternatively, the symmetry operators can be input explicitly, each separated with a '*'. However, the program only requires the total number of operators.
This is used to estimate the molecular weight of one molecule in Daltons. It is assumed that on average each protein residue contains 5 carbons, 1.35 nitrogens, 1.5 oxygen, 8 hydrogen and 0.05 sulphur atoms, and thus has a molecular weight of 112.5 Da. It is assumed that each DNA residue has an average weight of 325.96 Da. The average weight for a DNA/protein complex is calculated assuming a ratio of 0.25/0.75.
Obviously, these estimates are very approximate, and it is better to input the real molecular weight.
The molecular weight of a molecule in Daltons. What is important is the total molecular weight of the molecules in the asymmetric unit. This keyword is used in conjunction with NMOL. If this is not given, the program calculates a tentative molecular weight of the molecule, assuming the unit cell is 47% solvent in the case of protein, 64% for dna and 60% for a protein/dna complex.
This keyword is not compulsory but is used in conjunction with MOLWEIGHT. The <number> of molecules per asymmetric unit. Default 1.
This keyword is not compulsory and can be used in conjunction with NMOL and MOLWEIGHT. It produces a list of incrementing number of molecules, from NMOL (default 1), in the asymmetric unit whilst the %solvent is >0.0.
This keyword is not compulsory. The high resolution limit is used in the probability scoring.
Non-compulsory keyword. Mode C indicates that a dna/protein complex is to be modelled, and mode D indicates that dna is to be modelled. Other input leads to the calculation with protein only. The default is protein only.
This keyword is of little use for the 'user'. When specified matthews_coef will output a small XML file of the results. The name and location of the XML file can be specified on the command line with XMLFILE, otherwise the file will be called MATTHEWS_COEF.xml.
Example of input
CELL 73.58 38.73 23.19 SYMM 19 MOLW 6600.0 AUTO XMLO
Example of output file
<?xml version="1.0"?> <matthews_run>> <MATTHEWS_COEF ccp4_version="4.1" date=" 1/25/02" /> <keyword > </keyword> <cell volume=" 66085.78" /> <result nmol_in_asu=" 1" matth_coef=" 2.503249" percent_solvent=" 50.89439" prob_matth=" 0.9962950" /> <result nmol_in_asu=" 2" matth_coef=" 1.251625" percent_solvent=" 1.788778" prob_matth=" 3.564707E-03" /> </matthews_run>
Vm = cell volume ( cubic As) V ----------------------- = --- M*nasymu*nmols_asu M*Z M = molecular weight of protein in daltons V = volume of unit cell. Z = no. of molecules in unit cell. = nasymu*nmols_asu nasymu = number of asymm. units nmols_asu = number of molecules in asym unit. Molecular weight = number of protein residues in molecule * 110 - very roughly!!! = number of non hydrogen protein atoms in molecule *14 - roughly!!!!
Use RWCONTENTS to read your PDB file if you have one; it will count number of atoms of each type. Alternatively, GET_PROT can be used to calculate the Molecular Weight from a sequence. Note that while the Matthews coefficient is not very sensitive to the Molecular Weight, the probabilities from the Matthews coefficient paper of Kantardjieff and Rupp can be, especially if there are several molecules in the asymmetric unit.
Matthews found Vm somewhere between 1.66+ and 4.0+ corresponding to protein contents of 75% to 30% but proteins with higher solvent contents will give higher values of Vm. E.g. for a solvent content of 90%, the Vm would be 12+.
Using this you can calculate Vm assuming nmols_asu = 1/4,1/2,1,2,3 etc etc.. You MAY be able to narrow down the number of possibilities for nmols_asu. If Vm falls outside the range above then the number of molecules per asymmetric unit assumed, is likely to be incorrect.
Turning this into fraction of protein in asymmetric unit:
Total mass of Protein in unit cell Vp = --------------------------------------- Protein density * Unit cell volume Vp = M*Z*u/(V*Dp) = 1/(N*Dp*Vm) where Vp = fraction of protein volume in asymmetric unit. Vm = Matthews Number (A**3/Daltons) Dp = density of protein = 1.35 (g/cc) (ref 1) N = Avogadro constant = 6.023*10**23 gmole**(-1) u = Mass of Hydrogen = 1.66*10**-24 g ( It is sufficient to approximate the mass of a Hydrogen atom as (1/N) because the mass of 1 mole of Hydrogen approximates to 1g.) ==>From this it is easy to obtain the formula derived in Matthews i.e. Vp = 1.66*v / Vm = 1.23 / Vm 1/Dp is Matthew's v = 0.74 cc/g )
Alternatively:
Vp = Np* AV/V where Np = number of protein atoms in unit cell (including hydrogens) AV = average atomic volume in A**3 - = 10 approximately. (There are about the same number of hydrogens as C N O etc.) If Vp equals fraction of protein volume in asymmetric unit Density = Dp *Vp + Ds* (1-Vp) = 1.35*Vp + 1.0 * (1-Vp) = 0.35*Vp + 1.0 Ds = density of solvent. = 1.0 for H2O therefore Vp = (density -1.0)/0.35
If you know the density you can work backwards and find the number of molecules in the asymmetric unit exactly.
matthews_coef << eof CELL 73.58 38.73 23.19 symm 19 molweight 6600.0 nmol 1 eof
matthews_coef << eof CELL 73.58 38.73 23.19 SYMM 19 MOLW 6600.0 AUTO eof
Originator: Misha Isupov
Additions by: Charles Ballard ccb@dl.ac.uk, Alun Ashton a.w.ashton@ccp4.ac.uk, Eleanor Dodson ccp4@ysbl.york.ac.uk