Difference between revisions of "Task 6: Protein structure prediction from evolutionary sequence variation"

From Bioinformatikpedia
Line 37: Line 37:
   
 
*Why are the scores of residues close in sequence amongst the highest? Why are the pairs distant in sequence (n>5) more interesting for structure prediction?
 
*Why are the scores of residues close in sequence amongst the highest? Why are the pairs distant in sequence (n>5) more interesting for structure prediction?
  +
It lies in the nature of proteins, that residues that are close in sequence, are also close in structure. Consequently, they are evolutionary coupled and show covariation in the multiple sequence alignment.
The pairs, that are at least five residues apart, are more interesting for structure prediction, because they contain more information about the overall structure of the protein
 
  +
The pairs that are at least five residues apart in sequence, are more interesting for structure prediction, because they contain more information about the overall topology of the protein, i.e. they reduce the space of possible protein conformations more than pairs that are close in sequence.
   
 
*Look at the values, range and distribution of scores.
 
*Look at the values, range and distribution of scores.

Revision as of 19:23, 22 August 2013

Results

lab journal

Contact Prediction

<figure id="ras_score_dist">

Cn score ras.png

</figure>


<figtable id="hfe_score_dist" >

Cn score imm.png
Cn score mhc2.png

</figtable>

domain length reference HS pairs TP FP TP -rate
Ras 160 5P21 65 53 12 0.82
MHC I 174 1A6Z 69 29 40 0.42
Ig C1-set 76 1A6Z 15 9 6 0.6


<figure id="cm_1a6z">

ContactMap 1a6z new.png

</figure> <figure id="cm_hras">

ContactMap 5p21.png

</figure>

  • Why are the scores of residues close in sequence amongst the highest? Why are the pairs distant in sequence (n>5) more interesting for structure prediction?

It lies in the nature of proteins, that residues that are close in sequence, are also close in structure. Consequently, they are evolutionary coupled and show covariation in the multiple sequence alignment. The pairs that are at least five residues apart in sequence, are more interesting for structure prediction, because they contain more information about the overall topology of the protein, i.e. they reduce the space of possible protein conformations more than pairs that are close in sequence.

  • Look at the values, range and distribution of scores.
  • How many of the high-scoring pairs are true or false positives? Does this correlate with the value of the score? Visualize the predicted contacts together with the crystal structure contacts in a contact map plot.
  • Can you determine evolutionary hot spots, i.e. functionally important residues? Compare to conserved sites in the MSA. Compare with your results from task 7 (when you are working on task 7, i.e. this is a task for the future).
  • Here, the DI score is given. Compare the top 50 DI and CN (from freecontact) scores. How large is the overlap (>80%)?

MHC EVFOLD: constraints: 70(40%) 113(65%) 174(100%)

imm EVFOLD


RAS EVFOLD: 189 / 160 189: 76 (40%) 123 (65%) 189

Discussion