Equivalent Input Produces Different Output in the UniFrac Significance Test

BMC Bioinformatics 15:278, 2014

Supplementary information

In the above paper, we describe how UniFrac reports different results when given isomorphically euquivalent forms of the same input tree, where one form explicitly includes abundance counts and the other includes them implicitly. On this page, we provide a script for converting UniFrac input files from the explicit (or "compressed") format (for which we believe UniFrac produces results that are inconsistent with expectation) to the implicit (or "expanded") format (for which UniFrac produces the expected results).

Downloading the script

Download the script here. After downloading this file, please rename it from "UniFrac_to_expanded.txt" to "UniFrac_to_expanded.pl".

Using the script

To use this script, you must have Perl installed. If you are running UNIX, Linux, or Mac OS X, then you already have Perl installed. If you are running Windows, you may have to download and install it manually.

UniFrac_to_expanded.pl takes two input files - a Newick-formatted tree file and a file containing the abundance of each OTU for each sample - and outputs a new tree file and abundance file that are in the expanded format. The names of the OTUs in your input file may not contain underscores.

For example, suppose that you have called your Newick-formatted tree file tree.newick (download sample file) and that it has the following contents.

(A:0.1,B:0.1);

Further, suppose that you have called your abundances file abundances.txt (download sample file), and that it looks like the following:



A Sample1 4
A Sample2 0
B Sample1 0
B Sample2 4

Here, the columns are separated by tabs. To run the script, simply have the script and both input files in the same directory, change your current working directory to that directory, and type:

perl UniFrac_to_expanded.pl tree.newick abundances.txt

This will generate two new files. The first, tree.newick.expanded, will look like this:

((A_1:0,A_2:0,A_3:0,A_4:0):0.1,(B_1:0,B_2:0,B_3:0,B_4:0):0.1);

The second, abundances.txt.expanded, will look like this:



A_1 Sample1 1
A_2 Sample1 1
A_3 Sample1 1
A_4 Sample1 1
B_1 Sample2 1
B_2 Sample2 1
B_3 Sample2 1
B_4 Sample2 1

These files can then be used as input to UniFrac.

A_1	Sample1	1
A_2	Sample1	1
A_3	Sample1	1
A_4	Sample1	1
B_1	Sample2	1
B_2	Sample2	1
B_3	Sample2	1
B_4	Sample2	1