Convert distance matrix to phylip format

17 Dec 2011

I wrote a function to create a Euclidean distance matrix of some amino acid substitution matrices and I wanted to find a built-in method find the Spearman's rank of two lists to create a distance matrix that way. I found that BioPython actually has a method that builds distance matrices using various different distance metric, including Euclidean and Spearman's rank:

import Bio.Cluster
dm = Bio.Cluster.distancematrix(data, dist="s")

If you change the dist to "e", then it will calculate the Euclidean distance.

I thought there might be a way to output this in phylip format so I could use quicktree, but if there is, I wasn't able to find it. So here's mine:

fout = open(filename, 'w')
fout.write('%d\n' % len(names))
for name, row in zip(names, dm):
    for value in row:
       fout.write('\t%s' % value)

It assumes you have the distance matrix in the format created by the Bio.Cluster distancematrix function, and have a list of names for the sequences or matrices.

An example output would be:

B    1.2    0.8
C    3.2    1.6    2.0