Canavan Disease: Task 06 - Protein Structure Prediction
Contents
LabJournal
Dataset
To gain the HRas multiple sequence alignment the instructions were followed and the full MSA provided by Pfam (PF00071) was downloaded and used for further calculations and statistics. Searching for a multiple sequence alignment for ASPA/ACY2 in Pfam revealed that the two criteria to gain meaningful insights out of the calculations of freecontact, evcouplings and evfold, namely over 1000 sequences in the MSA and large parts of the reference sequence are contained in the MSA, are satisfied. The multiple sequence alignment for the protein family containing ASPA (PF04952) includes 2822 sequences and the region of ASPA that is used in the MSA spans from position 10 to 301 with ASPA having a total length of 313 amino acids. Hence the Pfam MSA is regarded as viable input for the following calculations.
HRAS
Freeconact is based upon searching conserved regions and correlated mutations in a multiple sequence alignment, to predict pairs of residues that are in contact in a protein. It is to be expected that residues that are close to each other in sequence are as well close in three dimensional space, as their contact often defines the secondary structure elements and the conformation of the protein on a small scale. Therefore residue pairs that are close in sequence are ranked with a high CN-score by freecontact. However more meaning full for the overall conformation of the protein are stabilizing contacts between residues that are more distant in sequence space. This is the reason why filtering the predicted contacts to exclude residues that are distant more than five residues in sequence. Looking at the distribution of the cn-scores (<xr id="hras_cn_distribution">Figure </xr>) this gets visible as well.
</figure>
</figure>
<figure id="hras_cn_distribution"> |
<figure id="hras_freecontact_contactmap"> |
The first thing to be noted is that only a tiny fraction (514 out of 12561 possible pairs) has a cn-score > 1, what is considered to be high scoring. If the set is reduced to residue pairs with a sequence distance greater five this subset of high scoring pairs is imideately reduced to 65 pairs. Secondly the maximal cn-scores is reduced from 6.01 to 3.40. Reducing the set however has no great impact on the precision. The predicted high scoring contacts of the orginal set contain 439 true positives and 75 false positives (precision of 0.854) while the reduced set contains 55 true positive predictions out of 65 predictions over all (precision of 0.846). The predicted contacts are visualized together with the actual contacts calculated with the aid of the crystal structure in <xr id="hras_freecontact_contactmap"> Figure </xr>. A overview of the top 20 predictions for HRAS in more detail are displayed in <xr id="top_20_hras"> Table </xr>.
<figtable id="top_20_hras">
Predicted residue contacts for HRAS by freecontact | |||||
---|---|---|---|---|---|
Residue #1 | Residue #2 | CN-Score | TP/FP | ||
Position | Amino acid | Position | Amino acid | ||
11 | A | 92 | D | 3.40454 | TP |
81 | V | 116 | N | 2.99937 | TP |
87 | T | 129 | Q | 2.68523 | FP |
82 | F | 141 | Y | 2.52755 | TP |
84 | I | 115 | G | 2.52502 | TP |
19 | L | 81 | V | 2.50464 | TP |
82 | F | 115 | G | 2.41709 | TP |
10 | G | 16 | K | 2.26384 | TP |
130 | A | 141 | Y | 2.24938 | TP |
123 | R | 143 | E | 2.21315 | TP |
114 | V | 155 | A | 2.09971 | TP |
17 | S | 57 | D | 1.98068 | TP |
80 | C | 93 | I | 1.96126 | TP |
142 | I | 158 | T | 1.90419 | TP |
78 | F | 100 | I | 1.82996 | TP |
25 | Q | 40 | Y | 1.66144 | TP |
116 | N | 146 | A | 1.63396 | TP |
8 | V | 20 | T | 1.60439 | TP |
72 | M | 99 | Q | 1.59088 | TP |
118 | C | 150 | Q | 1.51682 | TP |
</figtable>
ASPA
Tasks
- Link to Task 01: Canavan Disease
- Link to Task 02: Alignments
- Link to Task 03: Sequence-based Predictions
- Link to Task 04: Structural Alignments
- Link to Task 05: Homology Modelling
- Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
- Link to Task 07: Researching SNPs
- Link to Task 08: Sequence-based Mutation Analysis
- Link to Task 09: Structure-based Mutation Analysis
- Link to Task 10: Normal Mode Analysis
- Link to Task 11: Molecular Dynamics Simulation