In the above paper, we describe how UniFrac reports different results when given isomorphically euquivalent forms of the same input tree, where one form explicitly includes abundance counts and the other includes them implicitly. On this page, we provide a script for converting UniFrac input files from the explicit (or "compressed") format (for which we believe UniFrac produces results that are inconsistent with expectation) to the implicit (or "expanded") format (for which UniFrac produces the expected results).
To use this script, you must have Perl installed. If you are running UNIX, Linux, or Mac OS X, then you already have Perl installed. If you are running Windows, you may have to download and install it manually.
UniFrac_to_expanded.pl takes two input files - a Newick-formatted tree file and a file containing the abundance of each OTU for each sample - and outputs a new tree file and abundance file that are in the expanded format. The names of the OTUs in your input file may not contain underscores.
For example, suppose that you have called your Newick-formatted tree file tree.newick
(download sample file) and that it has the following contents.
(A:0.1,B:0.1);
Further, suppose that you have called your abundances file abundances.txt
(download sample file), and that it looks like the following:
A Sample1 4
A Sample2 0
B Sample1 0
B Sample2 4
Here, the columns are separated by tabs. To run the script, simply have the script and both input files in the same directory, change your current working directory to that directory, and type:
perl UniFrac_to_expanded.pl tree.newick abundances.txt
This will generate two new files. The first, tree.newick.expanded
, will look like this:
((A_1:0,A_2:0,A_3:0,A_4:0):0.1,(B_1:0,B_2:0,B_3:0,B_4:0):0.1);
The second, abundances.txt.expanded
, will look like this:
A_1 Sample1 1
A_2 Sample1 1
A_3 Sample1 1
A_4 Sample1 1
B_1 Sample2 1
B_2 Sample2 1
B_3 Sample2 1
B_4 Sample2 1
These files can then be used as input to UniFrac.