Task 6 (MSUD)

From Bioinformatikpedia
Revision as of 16:17, 15 June 2013 by Schillerl (talk | contribs) (moved Task 6 (MUSD) to Task 6 (MSUD): typo in title)

Results

Lab journal

H-Ras

BCKDHA

Freecontact

From the output of freecontact, only those residue pairs were extracted, that are separated by at least 5 residues. Pairs that are near in sequence are natually coupled, but this isn't interesting because it doesn't help to predict the 3D structure, where residues intereact, which are far away from each other in the 1D sequence. The CN (corrected norm contact) score values of these extracted pairs range between -0.87 and 5.28. The following diagram shows the distribution of the scores:

MSUD BCKDHA freecontact CN score distribution.png

Most scores are between -1 and 1. So those pairs, which have a score greater than 1 are considered high scoring and used for further analyses. High scoring pairs were regarded as true positive (TP), if their distance (between any pair of atoms) in the reference structure is below 5 Å. There are 94 TPs among 194 high scoring pairs, and this has a correlation to CN score of 0.31 (for a pair with higher CN score it is more likely that it is TP). This table shows the ten highest scoring pairs:

1. residue # 1. aa 2. residue # 2. aa MI score CN score true/false positive
247 H 291 Y 0.69 5.28 TP
236 F 264 C 0.45 4.02 TP
266 N 327 E 0.31 3.67 TP
239 G 270 A 0.35 3.48 TP
296 I 312 E 0.44 3.37 FP
316 R 324 F 0.60 3.34 TP
144 S 235 Y 0.48 3.17 TP
300 G 330 T 0.27 3.13 TP
261 I 317 A 0.25 3.06 TP
301 N 333 I 0.87 2.77 FP


In a second step, the highest scoring 300 (= alignment length) couplings were taken, scores for each residue were summed and normalized by the average score of these couplings. The residues with the highest values (with a gap to the others) are: Thr 338, Phe 130 and Tyr 158. Interestingly, according to Uniprot, Tyr 158 is in the thiamine pyrophosphate binding region, and near position 338 there are some modified residues (phosphoserine).

EVcouplings

The following table shows the ten residue pairs with highest DI (direct information) scores:

1. residue # 1. aa 2. residue # 2. aa MI score DI score
209 L 243 E 0.81 0.22
266 N 327 E 0.64 0.17
247 H 291 Y 0.42 0.13
192 P 200 R 0.61 0.11
296 I 312 E 0.22 0.10
242 S 291 Y 0.34 0.09
190 Q 196 G 0.72 0.09
236 F 264 C 0.39 0.09
250 F 288 G 0.34 0.08
210 A 218 G 0.11 0.08


The pairs that are also in top ten for freecontact are: 266/327, 247/291, 296/312, 236/264. The evolutionary contraint (EC) hotspots identified by EVcouplings, which have the highest EC strength (with a gap to the others) are residues Glu 243 and Leu 209. According to UniProt, near residue 209 there is a potassium binding region, and according to PBD, 243 is a thiamine pyrophosphate binding residue. It has to be noted, that the residues identified as important by freecontact are not included in the calculation of EVcouplings (probably they are not part of the alignment generated by it).

EVfold

For every number of contraints, EVfold computes 5 different structure models. The RMSD values for these models compared to the reference structure 1U5B are summarized in the following boxplot (RMSD values taken from EVfold output):

MSUD BCKDHA boxplot evfold RMSD.png

For 195 contraints (corresponding to 65 % of alignment length) the models agree more with the known structure, than if only 120 (40 %) contraints are used. To use 300 (100 %) constraints doesn't help for structure prediction compared to 65 %, it seems to work even worse for structure prediction (although there is a large range of RMSDs between the different models calculated withh 300 contraints).

In general, the RMSD values are very high, indicating that the models are very different to the real structure.

The contact maps show the predicted contacts compared to the real contacts in 1U5B (the calculation of EVfold doesn't cover the whole sequence):

MSUD BCKDHA fold ContactMap 120.png MSUD BCKDHA fold ContactMap 195.png MSUD BCKDHA fold ContactMap 300.png

For the lowest number of constraints, most are true positives, but some regions with contacts in the known structure are only covered, if more constraints are used. But in the second and third plot, it is visible that when using more constraints, there are more false positives (predicted contacts where in the real structure there is no contact). So using 60-70 % contraints may be a good compromise between recall and precision.

Discussion