Python scripts

Go to GitHub directory.

These scripts may be generally or specifically useful in research. All scripts have a simple command-line interface indicating its purpose and usage. All scripts depend on Python 3.5+ with scikit-bio 0.5.1+, unless otherwise stated.

Basic tree operations

count_nodes.py: Count tips and internal nodes in a tree.

bifurcate_tree.py: Bifurcate a tree.

trifurcate_tree.py: Remove arbitrary rooting from a tree.

assign_node_ids.py: Assign incremental node IDs to a tree in level order.

remove_supports.py: Remove node support values from a tree.

decrease_node_order.py: Re-order nodes of a tree in decreasing order.

export_supports.py: Export node labels (e.g., branch support values) to a table.

unpack_low_support.py: Unpack (collapse) internal nodes with branch support value lower than given cutoff.

append_taxa.py: Append extra taxa to a tree as polytomies based on a tip-to-taxa map.

match_label_support.py: Generate a nodel label to branch support value table.

round_lengths.py: Reduced the number of digits in branch lengths.

Specialized tree operations

trianglize_tree.py: Re-order nodes of tree in a way such that the two basal clades are in increasing and decreasing order, respectively. If the input tree is already midpoint-rooted, the output tree will shape like a triangle.

root_by_outgroup.py: Re-root a tree with a given set of taxa as the outgroup.

restore_rooting.py: Restore rooting scenario of a tree based on another.

subsample_tree.py: Shrink a tree to a given number of taxa which maximize the sum of phylogenetic distances.

make_rfd_matrix.py: Generate a matrix of Robinson-Foulds distances among all trees.

calc_length_metrics.py: Calculate branch length-related metrics, including height, depths and relative evolutionary divergence (RED) for all nodes.

calc_split_metrics.py: Calculate split-related metrics, including number of descendants, number of splits from tip or from root.

calc_bidi_metrics.py: Calculate bidirectional levels and depths for nodes in a tree.

Advanced analyses

align_distmat.py: Generate a matrix of pairwise Hamming distances among sequences in an alignment, excluding gaps.

phylo_distmat.py: Generate a matrix of phylogenetic distance (sum of branch lengths) among taxa in a tree.

sample_ab_dists.py: Sample pairwise phylogenetic and sequential distances inter- and intra-domains (Archaea and Bacteria).

Tool-specific utilities

phylophlan_summarize.py: Generate marker map and summarize genome to marker matches.

phylophlan_extract.py: Extract marker gene sequences based on PhyloPhlAn result.

phylosift_summarize.py: Summarize the number of hits per marker per genome.

phylosift_extract.py: Extract marker gene sequences from search result.

dm_to_phylip.py: Convert a distance matrix into the Phylip format (lower triangle) which can then be parsed by ClearCut.

r8s_summarize_result.py: Summarize r8s divergence time estimation results.

raxml_duplicate_map.py: Generate a core-to-duplicate map for an MSA filtered by RAxML.

Taxonomy utilities

shrink_taxdump.py: Shrink the standard NCBI taxdump files nodes.dmp and names.dmp so that they only contain given TaxIDs and their ancestors.

recursive_shear.py: Shear a tree recursively so that eventually all tips match a given taxon set.

map_taxa_in_tree.py: Convert a taxonomy tree into a genome tree based on a TaxID-to-genome(s) map.

taxdump_to_ranks.py: Extract parental taxa at given ranks from NCBI taxonomy for genomes.

taxdump_to_tree.py: Build a tree based on NCBI taxdump.

tree_to_taxonomy.py: Generate pseudo taxonomic hierarchies based on a tree.

ranks_to_tree.py: Convert a genome-to-ranks table into a tree.

New gtdb_to_taxdump.py: Convert GTDB taxonomy into NCBI taxdump style.

Community analysis utilities

New genomes_for_db.py: Linearize, filter and concatenate multiple genome sequences into a single Fasta file for subsequent database building.

ogu_from_maps.py: Generate an “OGU table” from WGS sequence alignment results.

normalize_to_cpm.py: Normalize a BIOM table to copies per million sequences (cpm).

filter_otus_per_sample.py: Filter out low-abundance OTUs within each sample in a BIOM table.