For most phylogenetic databases, the TimeTree database was used to translate phylogenetic ages based on a species tree to age estimates in millions of years ago.
Protein Family Databases
The Princeton Protein Orthology Database (PPOD) provides several sets of protein family predictions for all proteins in the 12 genomes of the GO Reference Genome Project. This species tree shows the evolutionary relationships and clade names used by ProteinHistorian for this set of species.
The OrthoMCL, MultiParanoid, and Naive Ensemble (Nens) databases contain families of predicted orthologs, while the Jaccard database contains larger families of more distantly related protein sequences.
See the PPOD help page for more details on the creation of these databases and appropriate references.
This protein family database is similar to the other PPOD databases described above, but it is based on an OrthoMCL clustering of all proteins in the 48 species present in v7.0 of the PANTHER classification system. Ages are provided for all proteins in the 32 eukaryotic species included in the PANTHER database. This tree shows the evolutionary relationships and clade names used by ProteinHistorian for this set of species.
Several recent analyses assigned ages to proteins based on the phylogenetic distribution of the functional subdomains that they contain. (See the ProteinHistorian paper for more info and references.) To enable this sort of analysis in ProteinHistorian, we analyzed the phylogenetic distribution of all Pfam domains across all species (tree) in v7.0 of the PANTHER database. We then created two different age databases: one in which each protein is assigned the age of its youngest domain and one in which each protein is given the age of its oldest domain. Proteins with no predicted domains are considered specific to the species in which they occur.
This database provides age predictions for individual protein domains, as defined by Pfam families. Thus it can be used to analyze the age of the functional units that make up proteins.
These age predictions for human were taken from the paper:
Domazet-Loso, T and Tautz, D (2008). An Ancient Evolutionary Origin of Genes Associated with Human Genetic Diseases. Mol Biol Evol, 25(12): 2699-2707. [Paper]
These age predictions for baker's yeast are based on the analysis of:
These age predictions for baker's yeast are based on the analysis of fungal orthogroups downloaded from the Fungal Orthogroups Repository on Sept. 10, 2009. The creation of the orthogroups is described in:
Wapinski I, Pfeffer A, Friedman N, Regev A (2007) Natural history and evolutionary principles of gene duplication in fungi. Nature, 449, 54-61. [Paper]
- Your Own Database
If you have a database of evolutionary relationships that you would like to use in your protein age analysis, download the command line version of ProteinHistorian and see the README for information about formatting the database. If you would like to make the database publicly available as a part of ProteinHistorian, please contact us.
Ancestral Reconstruction Algorithms
- Dollo parsimony
Dollo parsimony is based on the assumption that gaining a complex structure is much more rare than losing one. Thus, it assumes that there was a single gain event for each family, potentially followed by many losses in specific lineages. In other words, under Dollo parsimony, a family's origin is the most recent common ancestor (MRCA) of all species in which it is observed.
- Wagner parsimony
Wagner parsimony allows multiple gain and loss events in an ancestral family reconstruction as well as the ability to set weights on the relative likelihood of these events. By default, ProteinHistorian uses a relative gain penalty of 1. Since we focus on eukaryotic species in which horizontal gene transfer is rare, this largely serves to prevent false positives in the protein family databases from biasing age distributions. ProteinHistorian uses the implementation of Wagner parsimony from the Count package.
Return to ProteinHistorian submission form.