|
|
main |
|
search |
|
tutorial |
|
faq |
|
Pfam::Home |
|
|
X3D-PDB CASP7 Protein CorreLogo
Set your name in
UserPreferences
JSPWiki v2.2.33
|
October 17, 2006After calculating Mutual Information in the HCV genome it was time to see what challenges could be found in the HIV genome. Many challenges! HIV genome encodes multiple overlapping proteins with different reading frames. This makes it difficult to provide a true amino acid alignment representative of the proteins. The sequences for various elements of HIV can be found at Los Alamos HIV web site Once the mutual information pairs are identified for the multiple sequence alignment they are then programmatically mapped to a single sequence as a reference with no inserts. For this example 97BL006_AF193275 was used because it was first in the list. Each MI pair is then compared to see if they are found in a different protein sequence. The following boundaries were used based on the 97BL006_AF193275 sequence that represents the sequence without multiple reading frames.
An example of the genes found in a typical sequence with multiple reading frames
Using the yEd graph editor I began connecting nodes and was puzzeled by the patterns. It then occured to me that because of the multiple reading frames the overlapping positions would be detected as co-evolving pairs. Seeing this, I felt very confident that the calculations performed over the last 30 hours were correct. The initial graph with a sample of the detected co-evolving pairs from different genes with the same genome position but different reading frame is shown.
To filter for this effect each amino acid pair combination each occurence of amino acids in the mutual information pairs was counted. If an amino acid was found in more than one co-evolving pair then it was listed as a co-evolving pair for graphing. This helped filter the data set to something a little more interesting. The remaining layout was done by hand using the yEd graph tool and when possible amino acids from the same genome sequence position were not included. It was easier for the first attempt to do it by hand versus writing code to filter the bad pairs. Future improvements will provide a range of overlaps which will allow automatic filtering. Additional improvements were made to include the amino acid found that is contributing the most information to the co-evolving pair for each position. This same amino acid is used to determine the color of the node where Hydrophobic=Brown, Positive=Red, Negative=Black, Polar=White and Proline=Yellow. In cases where a single sequence position had different physio-chemical properties based on the other sequence position an additional node was added and then grouped. Each co-evolving pair is expressing information that may be different from other co-evolving pairs where one sequence position is in common. If it was possible that a sequence position was the result of a different reading frame then it was circled in green as an indicator that it may not be a true co-evolving pair for all combinations. I was doing it by hand so it can hav a few extra nodes that should be eliminated.
A high res version of this picture for printing is included below HIV-protein-protein-graph-noCRF-2048.png
|
|||||||||
|
| For questions or comments please contact willishf@ufl.edu |