NOTICE: You are viewing a page of the openwetware wiki. Our "dewikify" feature makes a wiki page appear as a normal web page. On September 22nd 2017, this feature will GO AWAY and this URL will redirect to the source URL on our wiki. We're sorry for the inconvenience.
Notice: The Wilke Lab page has moved to http://wilkelab.org.
The page you are looking at is kept for archival purposes and will not be further updated.
THE WILKE LAB

Home        Contact        People        Research        Publications        Materials

Contents

The Basics

Each HyPhy analysis must include several essential components:

HyPhy Batch File

Here is a basic HyPhy script.

DataSet myData = ReadDataFile ("aln.fasta");
DataSetFilter myFilter = CreateFilter (myData,1,"", "", "" );
F81RateMatrix =
           {{* ,mu,mu,mu}
                 {mu,* ,mu,mu}
                 {mu,mu,* ,mu}
                 {mu,mu,mu,* }};
HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1);
Tree myTree = ((a,b),c,d);
Model F81 = (F81RateMatrix, obsFreqs);
UseModel(F81);
LikelihoodFunction theLikFun = (myFilter, myTree);
Optimize (MLEs, theLikFun);
fprintf  (stdout, theLikFun);

Now, let's go line by line through the script above.

DataSet myData = ReadDataFile ("aln.fasta");

DataSetFilter myFilter = CreateFilter (myData,1,"", "", "" );

F81RateMatrix = etc.

HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1);

Tree myTree = ((a,b),c,d);

Model F81 = (F81RateMatrix, obsFreqs);

UseModel(F81)


Finally, you can define and maximize the likelihood function and then print its output. LikelihoodFunction theLikFun = (myFilter, myTree); Optimize (MLEs, theLikFun); fprintf (stdout, theLikFun);

Using Codon Models

The previous example provides an example of running an analysis using a nucleotide model of substitution. When analyzing protein coding data, however, it is often more useful and informative to use codon models of substitution. Such models also use nucleotide data, but consider them in terms of how the amino acid data they provide factors in to their composition. Matrices for codon substitution models, therefore, describe how each tri-nucleotide codon might evolve into a different codon as a unit rather than simply considering simple nucleotide substitutions.

Here is an example script which calculates the value of "omega," or dN/dS, which provides information about the direction and strength of selection on a coding sequence. It uses the Goldman-Yang codon model GY94, which has four parameters: omega, kappa (ratio of the rate of transversions to the rate of transitions), time, and "pi" (equilibrium codon frequencies). The original paper describing this model can be found here. Values for the first three parameters (which will be represented as w, k, and t, respectively) are deduced by optimizing the likelihood function, where as the codon frequencies are directly estimated from the provided data.


#include "ratematrixfile.txt"
#include "functions.txt"

global k;
global t;
global w;

DataSet myData = ReadDataFile ("aln.fasta");
DataSetFilter myFilter = CreateFilter (myData,3,"", "", " TAA,TAG,TGA" );
HarvestFrequencies (obsFreqs, myFilter, 3, 1, 1);
codonFreqs=BuildCodonFrequencies(obsFreqs);
Tree myTree = ((a,b),c,d);
Model GY94 = (GY94RateMatrix,codonFreqs);
UseModel(GY94);
LikelihoodFunction theLikFun = (myFilter, myTree);
Optimize (MLEs, theLikFun);
fprintf  (stdout, theLikFun);

Now, let's go through the aspects of this script which differ from the one discussed in the previous section.

#include "ratematrixfile.txt" and #include "functions.txt"

global k; global t; global w;

DataSetFilter myFilter = CreateFilter (myData,3,"", "", " TAA,TAG,TGA" );

HarvestFrequencies (obsFreqs, myFilter, 3, 1, 1);

codonFreqs=BuildCodonFrequencies(obsFreqs);

Model GY94 = (GY94RateMatrix,codonFreqs);


The parameters that this script will determine values for are k, t, and w - you now have some useful information about the selective regime on your protein sequence!

Good Resources

The HyPhy website may be found here: HyPhy They have a very extensive user forum which we highly recommending looking at.

An excellent overview of running more advanced (tons of great examples!) HyPhy scripts may be found in chapter six of the book Statistical Methods in Molecular Evolution edited by Rasmus Nielsen, which may be accessed via GoogleBooks here: Book

This site is hosted on OpenWetWare