gneiss.cluster.correlation_linkage

gneiss.cluster.correlation_linkage(X, method=’ward’)[source]

Hierarchical Clustering based on proportionality.

The hierarchy is built based on the correlationity between any two pairs of features. Specifically the correlation between two features \(x\) and \(y\) is measured by

\[p(x, y) = var (\ln \frac{x}{y})\]

If \(p(x, y)\) is very small, then \(x\) and \(y\) are said to be highly correlation. A hierarchical clustering is then performed using this correlation as a distance metric.

This can be useful for constructing principal balances [1].

Parameters:
  • X (pd.DataFrame) – Contingency table where the samples are rows and the features are columns.
  • method (str) – Clustering method. (default=’ward’)
Returns:

Tree for constructing principal balances.

Return type:

skbio.TreeNode

References

[1]Pawlowsky-Glahn V, Egozcue JJ, and Tolosana-Delgado R. Principal Balances (2011).

Examples

>>> import pandas as pd
>>> from gneiss.cluster import correlation_linkage
>>> table = pd.DataFrame([[1, 1, 0, 0, 0],
...                       [0, 1, 1, 0, 0],
...                       [0, 0, 1, 1, 0],
...                       [0, 0, 0, 1, 1]],
...                      columns=['s1', 's2', 's3', 's4', 's5'],
...                      index=['o1', 'o2', 'o3', 'o4']).T
>>> tree = correlation_linkage(table+0.1)
>>> print(tree.ascii_art())
                    /-o1
          /y1------|
         |          \-o2
-y0------|
         |          /-o3
          \y2------|
                    \-o4