About EPA
RAxML - Evolutionary Placement Algorithm
The Evolutionary Placement Algorithm (EPA) forms part of the standard RAxML release version 726.
The EPA algorithm works as follows:
Input:
A multiple sequence alignment in standard RAxML relaxed PHYLIP format that contains full length reference sequences and short reads that have been aligned with respect to the MSA of the full length reference sequences.
A reference tree topology that entails all full length reference sequences of the MSA.
Algorithm:
The EPA algorithm will initially read in the alignment and the reference tree. Those taxa of the alignment that are contained in this tree will be marked as reference sequences whereas the taxa not contained in the tree will ba marked as query sequences.
Initially, RAxML will optimize ML model parameters and branch lengths on the reference tree topology.
Thereafter, RAxML compute the optimal insertion branch (under Maximum Likelihood) in the reference tree for each query sequence (short read), that is, it will assign reads to branches of the tree potentially as multi-furcations, if more than one read is assigned to a branch of the reference tree. Essentially, we are calculating the distribution of reads over the branches of the tree.
In addition to the straight-forward assignment of each read (query sequence) to one specific branch via ML, the EPA algorithm can also quantify palcement uncertainty via Likelihood weights (recommended) or standard phylogenetic bootstrapping.
Output:
The EPA algorithm returns several plain text files that can then be used for analyzing the results (appropriate diversity metrics are currently under development).
It will return the reference tree for the full length sequences with branch lengths and branch labels. In addition, the program returns a classification file that contains the assignment of query sequences (reads) to the branch labels of the reference tree topology. If bootstrapping or likelihood weights are used, this file will contain multiple branch labels for each query sequence including the likelihood weight or bootstrap support values for each placement.
Documentation:
The original EPA algorithm is described in: Exelixis-RRDR-2009-3.pdf
The parallelization of the EPA algorithm is described in: Exelixis-RRDR-2009-2.pdf
A video of a talk about the EPA by Alexis Stamatakis is available here: http://www.scivee.tv/node/16283
Additional Capabilities of the Web-Server:
The Web-Server offers some additional capabilities:
- The web-server can align the short reads to a given reference alignment of full-length sequences via hmmalign.
- Prior to the hmmalign step, the web-server can cluster and thereby reduce the number of reads by deploying the novel method UCLUST developed by RC Edgar http://www.drive5.com/usearch/index.html
- The web-server can visualize EPA run results for you.
- If you intend to use multi-gene alignments, the server currently provides an additional submit form for that. Submit multi-gene alignment.