Mutual Information

products
main search tutorial faq Pfam::Home                  
JSPWiki: ProteinCorreLogo
JSPWiki logo
X3D-PDB
CASP7
Protein CorreLogo
Set your name in
UserPreferences

Edit this page




JSPWiki v2.2.33


ProteinCorreLogo


September 11, 2006

I have added numerous informational elements to the models. So many data elements that makes the model very busy but so far it works well. Numerous sources of data with different alignment values are represented. For the base of the structure a PDB sequence is used as the reference for the amino acids. The representative Pfam sequence for that PDB is then mapped onto the PDB sequence. Pfam overall alignment has many inserts to accommodate alignment for the family. The actual PDB sequence may also have variations with the actual sequence in Pfam. Detailed description of visual elements at the end of this posting.

PF00025.10 an example of MI Z=3 with distance > 16 angstroms

The following pictures are for Pfam PF00025.10 which has no Z=4 MI pairs but a large number of Z=3 MI pairs. Many of the MI pairs are very distant so I wanted to see how this would look in another representation. From the Pfam web site the functional description of this family PF00025.10

"Structural studies of Arf1 and Arf6 have revealed that although these proteins feature the switch 1 and 2 conformational changes, they depart from other small GTP-binding proteins in that they use an additional, unique switch to propagate structural information from one side of the protein to the other."

If you look at the image of the PDB structure many of the MI pairs are connected on opposite sides of the structure.

PF00025.10 1mr3 A-Many ligands for this structure
Overhead Walk X3D CorreLogo model-Use Octaga

PF00025.10 1mr3 A-Many ligands for this structure
X3D PDB model

PF00027.18-A good example with strong Mutual Information Z=4

This is also a good indicator of the the predicted MI pairs and the corresponding ligand for this Pfam PF00027.18 PDB 1ne4 A.

CorreLogo PF00027.18 1ne4 A
PF00027.18 1ne4 A

X3D CorreLogo model - Use Octaga

X3D PDB model

Binding Pocket

A binding pocket for this model is defined by an amino acid that has one atom within 5 angstroms of a ligand atom which is represented as coordinate data in the PDB data file. A collection of PDB's from the same Pfam has a high degree of variation of ligands for each PDB structure. I don't have a feel for the degree of correctness in a PDB structure only having one ligand defined versus another PDB structure in the same family with 10 ligand structures defined. This is further complicated in that multiple tertiary structures make up a quaternary structure where each sub-structure can form a pocket around a ligand.

The PDB structure is processed and each amino acid closer to 5 angstroms to a ligand atom is marked as being in a binding pocket. A pair-wise distance comparison of each marked amino acid in a tertiary structure is then used to determine binding pocket pairs. If the pair are closer than 8 angstroms then this is indicated by a green square. If the pair is greater than 8 but less than 12 angstroms then this is indicated by a yellow square. For pairs greater than 12 but less than 16 angstroms this is indicated by a red square. For pair combinations greater than 16 angstroms they are not represented in the model. Along the center diagonal line indicates amino acids that are sequence neigbors and it is expected to be dominated by green. For data points away from the center line this represents distant sequence positions and would indicate different secondary structures that are close in 3D space. A quick observation for MI pairs near a binding pocket cluster and away from the center diagonal could represent co-evolving pairs that play a coordinated role in the binding pocket. For a MI pair that shares a amino acid that is part of the binding pocket but the other corresponding pair is distant from the binding pocket could represent a mutation to preserve secondary structure. For example the pair could be represented by an amino acid that is in the binding pocket but the other amino acid is located in a turn. The amino acid in the turn could be compensating or adjusting for the mutation in the helix to help maintain overall secondary structure. Getting feedback from a structural biologist will be critical in defining guidelines on interpreting data relationships in the model.

Surface Accessibility

The surface accessibility is represented as a blue bar graph where no bar represents no accessibility to water and a full bar represents 100% accessibility to water.

Gap Indicator

The model is the PDB sequence which is a member of a particular Pfam in which Mutual Information is calculated. For each continuous insert in the aligned sequences in Pfam the height of a gray bar is incremented by one unit. Not sure it is important to know the scale or the actual number of gaps that are aligned to this particular PDB sequence but simply indicate regions of inserts and the overall relative size of the inserts. More feedback needed on the best way to represent this data element.

Amino Acid Sequence

The entire amino acid sequence is represented by text along the base of the gray graph that indicates Pfam inserts. In addition, at the base of each MI pair the specific amino acid for each column position is listed (green box with black letter). The blue box with white numbers indicates the amino acid positions in the PDB structure. This creates a mapping problem in that a MI pair may not have a corresponding amino acid in a particular sequence because of an insert. If this is the case the nearest aligned amino acid is used which should represent the correct region on the 3D PDB structure.

Secondary Structure

The white box with green letters indicates the secondary structure that the specific amino acid belongs in the PDB structure. H=Helix, B=Sheet, and T=turn.

Surface Accessibility

Green box with black numbers represents the surface accessibility for the MI amino acid pair.

PDB Distance

The reb box at the base of the MI column is the average distance between the MI pair of all PDB structures with 90% sequence alignment for the particular Pfam. On the perpendicular face of the red bix is the standard deviation of the distance. Need to indicate the number of PDB models used to calculate the standard deviation. Could also establish a minimum count before displaying standard deviation.

  • Entropy
On the outside of the main model is the entropy for each column position where the same color assignments are made as listed below. After converting each amino acid based on its Physio-Chemical properties the percentage contribution of each is summed and then drawn as a bar graph. You can use page down or cycle through the viewpoints to get an aerial view of this data element.

Color

An attempt was made to adjust the colors used to map the Physio-Chemical colors. So far brown which I am using to represent hydrophobic is a dominating color in Mutual Information pairs. Black has been used to represent hydrophobic but because of the black background the information gets lost. Could be possible to change the background or work on different color schemes that work in this setup for both 2D pictures and 3D models that you walk through. One nice relationship is that plus amino acids currently represented by red are also hydrophobic and red and brown are similar colors. Black for negative, white for polar and yellow for small amino acids(P). Need to spend more time on color combinations and give the user the option.

3D ribbon models

The 3D ribbon models now include the ligands for a tertiary structure and amino acids that are considered to be in the binding pocket are indicated by the color yellow. The MI pairs are connected by a line and the combination of these data elements provides strong evidence that the identified MI pairs can be used to indicate important regions in the sequence.

Future Work

Need to collect feedback from others for the new punch list.

July 24, 2006

Made progress on generating a prototype of CorreLogo for proteins. See screen shots for examples. I have also included the X3D file. I am using the latest and greatest from the X3D standard which means a limited number of viewers to view the file. Blaxxun is not handling the orientation of the text properly so please use the free Octaga player on Windows Octaga. Xj3D 2.0 beta also has problems with text but works for non-windows platforms. For an overview of X3D viewers in general please see X3DPDB

One of the main challenges is mapping color in an informative way onto the Mutual Information bars. The following is from the Venn diagram found at http://www.russell.embl.de/aas/

  • alcohol o S,T
  • aliphatic l I,L,V
  • any . A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
  • aromatic a F,H,W,Y
  • charged c D,E,H,K,R -- covered by +/-
  • hydrophobic h A,C,F,G,H,I,K,L,M,R,T,V,W,Y
  • negative - D,E
  • polar p C,D,E,H,K,N,Q,R,S,T -- N Q S
  • positive + H,K,R
  • small s A,C,D,G,N,P,S,T,V
  • tiny u A,G,S
  • turnlike t A,C,D,E,G,H,K,N,Q,R,S,T

Color was assigned to the following groups which covers the 20 amino acids. Colors can change based on standards,common color assignments or user preference. The groups can also be easily changed.

  • H,K,R positive - Red
  • A,C,F,G,I,L,M,T,V,W,Y hydrophobic - Brown for an oil like color but also helps with the grouping of H,K,R which are also hydrophobic and displayed as the color red. The goal is to allow the red and brown to blend visually by the user if needed
  • D,E - negative - Black Probably shouldn't use black as it doesn't stand out but easy to remember that black is negative
  • N,Q,S - polar - White for the polar ice caps
  • P - small - Yellow

At the base of each MI bar the secondary structure for each position is given. On top of that bar is the sequence position for reference.

The Pfam 4055.9 was used to calculate Mutual Information and a PDB sequence from the family was selected to illustrate. In this case PDB 1olt was used. The Pfam sequence is a sub-sequence of the PDB sequence and from the overhead view this is illustrated by a white grid located on a larger grid. This is used to convey that only a portion of the overall sequence is being used.

Todo list for visual improvements

  • Add gap fraction via side bar
  • Add 2D sequence logo via side bar
  • Add overall secondary structure information via side bar
  • Add surface accessibility parallel to secondary structure opposite 2D sequence logo
  • Add structure distance above each MI bar either as text or bar
  • Rework and improve all of the above

Other interesting options

  • Allow visual annotations to be added to the 3D model by users who are researching a particular sequence or protein family for group collaboration
  • Allow the user to select a specific MI pair and load in the corresponding PDB model and highlight the region on the tertiary structure

References to older images removed

July 21, 2006

First attempt to take pre-calculated Mutual Information for a particular Pfam and generates a CorreLogo style X3D file. I don't really have a good approach to deal with the representing the 400 possible combinations in color. Insert gaps for the family also create wasted space on the graph which minimizes the ability to display visual information.

It probably makes sense for Pfam with known PDB sequences to graph a representative sequence which would eliminate gaps and allow distance information between MI pairs to be represented.

The following is the first attempt for PF04055.9 which represents a Pfam with a high number of MI pairs so it is a good test case. Only MI pairs with an average sequence distance > 10 when subtracting the number of inserts and Z score >=2. With inserts it has a length of 580 positions.




Go to top   Edit this page   More info...   Attach file...
This page last changed on 11-Sep-2006 22:55:00 EDT by 68.233.52.209.


home search tutorial faq Pfam::Home
For questions or comments please contact willishf@ufl.edu