Mutual Information

products
main search tutorial faq Pfam::Home                  
JSPWiki: InfluenzaA
JSPWiki logo
X3D-PDB
CASP7
Protein CorreLogo
Set your name in
UserPreferences

Edit this page




JSPWiki v2.2.33


InfluenzaA


May 19, 2007

Influenza is a RNA virus more commonly referred to as the flu. Various strains of Influenza have been responsible for killing millions of people. Influenza A is the most severe and Bird Flu or HN51 is a form of Influenza A which currently infects birds. The concern is that a new variant of HN51 will find a way to infect humans. 2000+ Influenza A genomes were downloaded from The J. Craig Ventur Institute formerly known as The Genome Research Institute. The data is organized by protein as unaligned data. Clustalw was used to align the sequence data by protein and then assembled as a global alignment for the entire genome. Duplicate enteries were eliminated resulting in 1818 Influenza A genome sequences.

The proteins and AA sequence offsets based on the order of data in the downloaded fasta file

  • polymerase PB2 (0,760)
  • PB1-F2 protein (761,861)
  • polymerase PB1 (862,1619)
  • polymerase PA (1620,2335)
  • nonstructural protein 2 NS2 (2336,2456)
  • nonstructural protein 1 NS1 (2457,2693)
  • nucleocapsid protein CAP (2694,3191)
  • neuraminidase NA (3192,3677) (NEU was used in the network topology model)
  • matrix protein 2 MP2 (3678,3774)
  • matrix protein 1 MP1 (3775,4026)
  • hemagglutinin HA (4027,4614) (HEM was used in the network topology model)

The RPE algorithm was used to calculate Mutual Information for the 1818 sequences. This took approximately 48 hours on an AMD dual core 1800 processor. The algorithm has no notion of gene boundaries and a does a pair wise comparison on all sequence positions and the determines the high scoring mutual information amino acid pairs. Based on high mutual information the two sequence positions indicate that one position mutates the other sequence position must also mutate to compensate. The reason for the compensating mutation can not be determined but simply indicates that information exists that may have value between the two sequence positions.

In prior analysis of mutual information in HCV,HIVGENOME and Dengue the pairs expressing the most information was occurring between proteins. The assumption for Influenza A with no prior knowledge of the protein topology is that it would be the same. This was not the case and the following figure shows the top scoring 50 mutual information pairs occur in Neuraminidase and Hemagglutinin. In the top scoring 50 mutual information pairs no interaction is detected between proteins. The first reaction was to double check alignment and code to make sure a mistake had not been made.

Top scoring 50 MI pairs Influenza A

To better understand at what point other protein-protein interactions were occurring a network topology model was built for the top 200 scoring mutual information pairs. The model remained roughly the same where all co-evolving amino acid pairs were found in Neuraminidase and Hemagglutinin. The same algorithm is used for HCV, HIV, Dengue which show interactions occur between proteins but in the case of Influenza A all interactions are isolated to sequence positions in Neuraminidase and Hemagglutinin.

Top scoring 200 MI pairs Influenza A

An illustration of Influenza with a short summary of protein interactions can be found at Visual

A team of researchers led by NIAMS Alasdiar Steven and working with a H3N2 strain were able to image the virus using electron tomography(ET). Details

Influenza A H3N2 virus Green is Hemagglutinin and Yellow is Neuraminidase

Influenza A virus

Looking at the above 3D model it is clear that Hemagglutinin groups with other Hemagglutinin proteins where some amount of space is required between each Hemagglutinin structure. The mutual information relationships in Hemagglutinin can help preserve secondary/tertiary structure. It is also possible if the sequence positions are found on the protein surface that they are preventing Hemagglutinin structures from interacting or maintaining an even spacing.

For Neuraminidase the structures are clustered in tight groups and sequence positions with high mutual information relationships depending on their location on the 3D structure may form protein-protein interactions from the complement sequence position in a neighboring Neuraminidase structure or help with overall orientation of each structure.

PDB structures for Neuraminidase (2HU4) and Hemagglutinin (21BX) are available as complex quaternary structures. For Neuraminidase a point of interest is sequence position 45 and was not part of the solved PDB structure. Each PDB structure was rendered in UCSF Chimera as a surface and when possible the sequence position for each mutual information relationship were highlighted in a unique color and labeled. From the mutual information network topology model color annotations were done on each structure by cluster of sequence positions. In all cases except one in Neuraminidase Q:293 the sequence positions were found on the surface. For alignment with the sequence found in the Hemagglutinin PDB model 21BX the mutual information network topology model uses the sequence with Influenza A virus (A/duck/Viet Nam/18/2005(H5N1)). For Neuraminidase 2HU4 the PDB model was recently released and the amino acid sequence position offsets in the PDM model showed good agreement with the A/duck/Viet Nam/18/2005(H5N1)) sequence.

The following 3D structures show when possible the top 50 mutual information relationships in the influenza A genome data.

Three Hemagglutinin structures 21BX Cluster 1

Three Hemagglutinin structures 21BX Cluster 2

Three Hemagglutinin structures 21BX Cluster 3

Four Neuraminidase structures 2HU4 Cluster 1

Four Neuraminidase structures 2HU4 Cluster 1 Opposite Side

Four Neuraminidase structures 2HU4 Cluster 2

Four Neuraminidase structures 2HU4 Cluster 2 Opposite Side


Attachments:



Go to top   Edit this page   More info...   Attach file...
This page last changed on 19-May-2007 23:40:48 EDT by root.


home search tutorial faq Pfam::Home
For questions or comments please contact willishf@ufl.edu