gneiss.cluster.rank_linkage

gneiss.cluster.rank_linkage(r, method=’average’)[source]

Hierchical Clustering on feature ranks.

The hierarchy is built based on the rank values of the features given an input vector r of ranks. The distance between two features \(x\) and \(y\) can be defined as

\[d(x, y) = (r(x) - r(y))^2\]

Where \(r(x)\) is the rank of the features. Hierarchical clustering is then performed using \(d(x, y)\) as the distance metric.

This can be useful for constructing principal balances.

Parameters:
  • r (pd.Series) – Continuous vector representing some ordering of the features in X.
  • method (str) – Clustering method. (default=’average’)
Returns:

Tree for constructing principal balances.

Return type:

skbio.TreeNode

Examples

>>> import pandas as pd
>>> from gneiss.cluster import rank_linkage
>>> ranks = pd.Series([1, 2, 4, 5],
...                   index=['o1', 'o2', 'o3', 'o4'])
>>> tree = rank_linkage(ranks)
>>> print(tree.ascii_art())
                    /-o1
          /y1------|
         |          \-o2
-y0------|
         |          /-o3
          \y2------|
                    \-o4