Protocols
Protocols for building, analyzing, and using the trees and other resources in this project.
Building
-
Genome retrieval: Download all bacterial and archaeal genomes available from NCBI GenBank and RefSeq, using RepoPhlAn.
-
Genome sampling: Select n genomes form a genome pool such that they maximize included biodiversity as measured by the k-mer signatures of genomes.
-
Marker identification: Identify and extract amino acid sequences of 400 global marker genes from genomes, using PhyloPhlAn.
-
Tree building: Build phylogenetic trees of genes and species using various approaches.
-
Tree manipulation: manipulate phylogenetic trees using the Python scripts developed by our team.
-
Taxon subsampling: Select n taxa from a larger phylogenetic tree such that it maximizes representation of deep-branching, large clades.
-
Taxonomy curation: Evaluate, modify and extend existing taxonomic assignments based on a phylogenetic tree.
Analysis
-
Tree comparison: Compare the phylogenetic relationships and distances indicated by individual species trees.
-
Tree comparison by depth: Compare the topologies of two trees with consideration of phylogenetic depth.
-
Major clade dimension: Calculate and compare the dimensions of major clades (e.g., Archaea vs. Bacteria), including distances between crown groups and distances between leaves.
-
Shared clades: Collapse two very large trees to a shared set of large clades to enable back-to-back comparison via tanglegram.
-
Gene tree discordance: Analyze evolutionary discrepancy reflected by individual gene trees.
-
Saturation test: Analyze potential amino acid substitution saturation and how it impacts estimated phylogenetic distances.
-
GTDB translation: Process GTDB taxonomy and trees to enable cross-translation with our work.
Observation
- Tree rendering: Collapse tree at given rank(s) and generate files ready for iTOL and FigTree rendering.
Application
-
Genome database: Build a reference genome database with phylogeny-curated taxonomy to improve an existing metagenomic sequence classification workflow.
-
Community ecology: Convert WGS sequence alignments into an “OGU table” and perform microbial community ecology analyses with the reference phylogeny.
-
Tree profiling: Modify an existing metagenomic profiling workflow to allow sequences to be directly assigned to tips and internal nodes of the reference phylogeny.