Difference between revisions of "Gaucher Disease: Task 06 - Protein structure prediction from evolutionary sequence variation"

From Bioinformatikpedia
(Glucocerebrosidase)
(Glucocerebrosidase)
Line 121: Line 121:
 
! style="background:#adceff;" | Mutation
 
! style="background:#adceff;" | Mutation
 
|-
 
|-
|523 || S || 1251.33 ||
+
|64 || S || 12.492447 ||
 
|-
 
|-
|522 || I || 1204.23 ||
+
|89 || E || 11.975236 ||
 
|-
 
|-
|524 || P || 1192.88 ||
+
|54 || V || 11.643162 ||
 
|-
 
|-
|521 || T || 1152.37 ||
+
|53 || V || 11.468116 ||
 
|-
 
|-
|43:C 11.451568 ||
+
|43 || C || 11.451568 ||
 
|-
 
|-
|76:F 11.379880 ||
+
|76 || F || 11.379880 ||
 
|-
 
|-
|92:M 10.660677 ||
+
|92 || M || 10.660677 ||
 
|-
 
|-
|65:F 10.616400 ||
+
|65 || F || 10.616400 ||
 
|-
 
|-
|55:C 10.553043 ||
+
|55 || C || 10.553043 ||
 
|-
 
|-
|61:Y 10.328963 ||
+
|61 || Y || 10.328963 ||
 
|-
 
|-
 
 
 
 
 
 
53:V 11.468116
 
54:V 11.643162
 
89:E 11.975236
 
64:S 12.492447
 
 
 
 
|}
 
|}
 
<center><small>'''<caption>''' Top scores of 10 Hot Spots calculated from the 533 (L) best scoring pairs. Information about mutation is taken from dbSNP and differs between disease causing (dc) and non disease causing (ndc).</caption></small></center>
 
<center><small>'''<caption>''' Top scores of 10 Hot Spots calculated from the 533 (L) best scoring pairs. Information about mutation is taken from dbSNP and differs between disease causing (dc) and non disease causing (ndc).</caption></small></center>

Revision as of 23:13, 27 August 2013

Lab journal

Calculate and analyze correlated mutations

Not all predicted contacts are needed to predict structure from sequence. Residues that ly close to each other in their primary structure, have automatically contact due to their direct neigbourhood in the sequence. Such a contact does not give any information, but rather leads to noise in the results. We are interessted in the contacts that apear because of the secondary and tertiary structure of the proteins. This information, we get from residues that have a greater distance in the sequence, but should be in contact in space acording to their distance.


HRas

The CN scores between all residues range between -0.65 and 6 (<xr id="rhas_dist"/>). By only looking at contacts between residues with a sequence distance of at least 5 residues, the upper range of the CN scores decreases to 3.4. The score distributions of the described sets:

<figtable id="rhas_dist">

residues minimum lower quartile median upper quartile maximum
all -0.65 -0.23 -0.11 0.04 6.00
filtered -0.65 -0.24 -0.13 0.01 3.40
Range and score distribution of all predicted HRas contacts and HRas contacts between residues with a sequence distance of >5 residues.

</figtable>

Residue pairs with a CN>1 are defined as high scoring pairs. These pairs are predicted to be in contact. Only nearly 5% of the residue pairs have a score high enough to be seen as contacts. In the filtered set this applies to less than 1%.


The high scoring pairs were compared to contacts of the HRas documented in pdb. The 65 predicted contacts have a TP-rate of 84.6% and could be classified into

  • TP: 55
  • FP: 10

Although more than half of FP predicted contacts have a lower score, there can be seen no correlation between FP/TP and the CN score. The Pearson correlation leads only to a non-significanz of 0.15.

On the contact map, a significant pattern for a domain identification could not be observed.

Contact map of predicted contacts based on freecontact and the real contacts documented in the 121p.pdb (blue). Contrary to the pdb contact, the calculated TP (darkblue) and the FP (red) only consider residues with a sequence distance of at least 5 residues

Hot Spots The 10 residues with the best top score were defined as hot spots. A mutation at this residues will have an great influence on the 3D structure of the protein. For some of these residues snps are known. <xr id="hot"/> shows which mutations appear and if they are disease causing.

The 50 residues with the best CN were compared to 50 hot spots calculated by EVcouplings. These hot spots have an overlap of only 44%. The overlaping hotspots were also ranked very differently of both programs.


<figtable id="hot">

Position Residue Top Score Mutation
82 F 9.91
81 V 7.361
141 Y 6.67
143 E 6.52
115 G 6.51
40 Y 5.50
84 I 5.40
145 S 5.25
116 N 5.01
144 T 4.90
Top scores of 10 Hot Spots calculated from the 164 (L) best scoring pairs. Information about mutation is taken from dbSNP and differs between disease causing (dc) and non disease causing (ndc).

</figtable>

Glucocerebrosidase

<figtable id="gluco_dist">

residues minimum lower quartile median upper quartile maximum
all -0.66 -0.15 -0.04 0.09 4.36
filtered -0.66 -0.15 -0.04 0.08 4.00
Range and score distribution of all predicted Glucocerebrosidase contacts and Glucocerebrosidase contacts between residues with a sequence distance of >5 residues (filtered).

</figtable>


pearson: 0.20 (dif 29) 247

  • TP: 97
  • FP: 150
Contact map of predicted contacts based on freecontact and the real contacts documented in the 1OGS.pdb (light and darkblue). Contrary to the pdb contact, the calculated TP (darkblue) and the FP (red) only consider residues with a sequence distance of at least 5 residues



<figtable id="hot2">

Position Residue Top Score Mutation
64 S 12.492447
89 E 11.975236
54 V 11.643162
53 V 11.468116
43 C 11.451568
76 F 11.379880
92 M 10.660677
65 F 10.616400
55 C 10.553043
61 Y 10.328963
Top scores of 10 Hot Spots calculated from the 533 (L) best scoring pairs. Information about mutation is taken from dbSNP and differs between disease causing (dc) and non disease causing (ndc).

</figtable>