Mutual Information

products
main search tutorial faq Pfam::Home                  
JSPWiki: Main
JSPWiki logo
X3D-PDB
CASP7
Protein CorreLogo
Set your name in
UserPreferences

Edit this page




JSPWiki v2.2.33


Main


If you have any questions or suggestions please send me an email willishf < at > ufl.edu

To search for a particular pfam use the search menu item in the toolbar not the Search Wiki box. You can enter any part of the Pfam name and it will return matches. All Pfam data and Mutual Information data is organized in XML gzip files and is available for download.

Currently, the mutual information data is limited to the 700+ Pfam families used to validate the different approaches to calculating mutual information.

For a short paper on how Mutual Information is calculated using the RPE Method click here willis-MI-RPE

For an overview of viewing X3D models of the PDB structure click here X3D-PDB

Click on this link PF04055.9 for a good example data set.

Other MI Z>=4 Examples from the test set


May 19, 2007

Using Mutual Information to detect protein-protein interactions in the InfluenzaA virus

May 15, 2007

Using Mutual Information to detect protein-protein interactions in the Dengue virus.

November 29, 2006

Looks like I had someone send a wiki bot to trash my site. Restored from backup but lost a couple of the latest posted items.

MI models for HIV

October 5, 2006

Working on MI and models for HCV

October 1, 2006

Working on abstract submission for CASP and just finished the paper for RECOMB 2007 on ProteinCorreLogo. Part of that paper found a good example of a MI pair that was very distant in the tertiary structure but in the dimer configuration they are very close. As part of the CASP analysis I put together the output for all PDB models average distance for Z=4,Z=3, and Z=2. Turns out I may a few more examples of what appears to be bad MI models may be showing that the MI pairs are neighbors in the quaternary structure. Will start a new page on this topic MIQuaternary

September 11, 2006

It has been a busy couple of months. I presented my work with X3D at Siggraph 2006 as part of a X3D tech talk forum. From August 3-10, I atteneded the ISMB 2006 conference in Fortaleza, Brazil. I presented my research on Mutual Information in Pfam at the 3DSig conference which is held every two years at the beginning of ISMB.

The big change was moving from Gainesville(home for the last five years) to West Palm Beach.

Life is beginning to return to normal so I have had time to do more work on CorreLogo for protein sequences. For a review of the latest please see ProteinCorreLogo

July 21, 2006

Working on implementing CorreLogo for protein sequences. CorreLogo works well with RNA and DNA alignments where color provides information related to the MI pairs. A little bit of a challenge for amino acids as this represents 20x20 or 400 color combinations. Starting a new wiki page for details ProteinCorreLogo

June 8, 2006

Submitted first two target submissions to CASP7

June 1,2006

Rosetta requires gcc 3.3 to compile which makes deploying on recent linux distributions on clusters difficult. I had success using the latest build of Debian which did not migrate the OS code base to require gcc 3.4. A VMware virtual machine was setup to allow it to run as an application appliance on a host operating system. More time needs to be spent testing the best way to deploy the VMware player as an application on a cluster. Fairly new concept so initial conversations with System Admins have been less than positive. I should be able to deploy as a local user but then need to figure out how to launch as a PBS job and some how pass the config file or work to be done to the application appliance. It may be possible to define a process where the application running in the VMware player checks back to an external web server for work to be done and then submits the results to the web server. VMware really opens up the possibilities of supporting legacy applications that require a specific OS and still run on large clusters which are running a modern OS.

May 26, 2006

The CASP7 competition is underway so for the next two months the focus will be on developing solutions to the released sequences. I am using Rosetta from Dr. Baker's lab which was the runaway winnder for CASP6. I will be using the mutual information signature for a protein family to develop a set of constraints for rosetta as it develops possible solutions for a sequence. I will then select the best PDB using a scoring algorithm that matches the Mutual Information signature.

March 20,2006

Working on variable base log n scaling when calculating entropy. The problem is associated with a sequence position that does not have a high number of distinct elements because for example only a charged amino acid can be in that position to preserve structure or function. The max entropy for that position is minimized using log base 2 because it only has 4 possible amino acids that can occupy that position. By dividing by log base 4 observing 4 distinct amino acids will weight the entropy score from 0 to 1. Initial results using this method in limited data sets to improve the prediction of co-evolving pairs showed poor results.

March 15,2006

Working on reducing the number of cycles between three amino acids with high mutual information. After looking at the data it became very clear that a majority of the cycles form when an amino acid needs to mutate to preserve local secondary structure. These local compensating mutations then appear to share mutual information through an associative property with a co-evolving pair. The first method to eliminate the cycle was to eliminate the amino acid pair with the lowest MI score assuming it was most distant in 3D space. This was the case around 60% of the time. This was not accurate enough to improve the prediction quality of co-evolving pairs as the two vertices in the graph tend to be < 12 angstroms.

March 13,2006

Finished processing MI calculations for the 2,700+ Pfam families. If a referenced PDB for a sequence in a Pfam family does not have at least 90% alignment then no distances scores are given for that family.

March 6,2006

Finished putting together the summary information for each Pfam. If the Pfam has mutual information data available the Z>=4 and Z=3 co-evolving pairs are listed. This is a work in progress where most of the time is spent putting together the VIEW of the data. This is then applied to the Pfam families.

The next thing on the list is to get the 3D X3D ribbon models for each family listed. This will take place over the next couple of days.

March 5,2006 Todo List

  • Upload the remaining data files and make them available for download
  • Put together a couple examples of protein families with HTML display of the important data and 3D ribbon models
  • Make non-gzip XML version of mutual information available as a link so results will display in browser. Some files are large so will create a problem for web browser trying to display
  • Generate X3D ribbon models for the 700+ protein families and make them available for download
  • Put together tutorials on X3D and how to use the XML X3D version of the ribbon PDB models.
  • Need to reprocess all 2700+ protein families for mutual information with a couple more data elements added in. This will take 3 days on a 24 node cluster so I will need to schedule that for next weekend as I am using a cluster at UF as a favor and don't want to lose that privilege

March 4,2006 Wiki has been installed

My first action was to check out what the latest and greatest HTML editor that supports JSP pages. Then it occured to me that I really didn't want to do that and using a Wiki makes more sense. One of the original features was to setup a discussion forum for each protein family so did some google searching and found JSPWiki.

It helps to understand the concepts of JSP pages and war file deployment but the bottom line is they made it really easy to setup. Took about 10 minutes for the standard Wiki demo. Once I understood how everything worked I wanted to Wiki enable the www.proteinx3d.com web site with Wiki like features but not have everything controlled by the Wiki. I auto-generate web pages for the protein families with a search interface that understands the data in the protein families. Another 4-5 hours of testing and tracking down the files that needed to be edited and the web site is now wiki enabled.

March 3,2006 www.proteinx3d.com is up

The work has just begun to get a fully functioning web site with a large amount of protein related data organized and displayed.




Go to top   Edit this page   More info...   Attach file...
This page last changed on 19-May-2007 22:07:02 EDT by root.


home search tutorial faq Pfam::Home
For questions or comments please contact willishf@ufl.edu