Equivalent Input Produces Different Output in the UniFrac Significance Test

BMC Bioinformatics 15:278, 2014

Supplementary information

In the above paper, we describe how UniFrac reports different results when given isomorphically euquivalent forms of the same input tree, where one form explicitly includes abundance counts and the other includes them implicitly. On this page, we provide a script for converting UniFrac input files from the explicit (or "compressed") format (for which we believe UniFrac produces results that are inconsistent with expectation) to the implicit (or "expanded") format (for which UniFrac produces the expected results).

Downloading the script

Download the script here. After downloading this file, please rename it from "UniFrac_to_expanded.txt" to "UniFrac_to_expanded.pl".

Using the script

To use this script, you must have Perl installed. If you are running UNIX, Linux, or Mac OS X, then you already have Perl installed. If you are running Windows, you may have to download and install it manually.

UniFrac_to_expanded.pl takes two input files - a Newick-formatted tree file and a file containing the abundance of each OTU for each sample - and outputs a new tree file and abundance file that are in the expanded format. The names of the OTUs in your input file may not contain underscores.

For example, suppose that you have called your Newick-formatted tree file tree.newick (download sample file) and that it has the following contents.

(A:0.1,B:0.1);

Further, suppose that you have called your abundances file abundances.txt (download sample file), and that it looks like the following:

ASample14
ASample20
BSample10
BSample24

Here, the columns are separated by tabs. To run the script, simply have the script and both input files in the same directory, change your current working directory to that directory, and type:

perl UniFrac_to_expanded.pl tree.newick abundances.txt

This will generate two new files. The first, tree.newick.expanded, will look like this:

((A_1:0,A_2:0,A_3:0,A_4:0):0.1,(B_1:0,B_2:0,B_3:0,B_4:0):0.1);

The second, abundances.txt.expanded, will look like this:

A_1Sample11
A_2Sample11
A_3Sample11
A_4Sample11
B_1Sample21
B_2Sample21
B_3Sample21
B_4Sample21

These files can then be used as input to UniFrac.