Saturday, 17th December 2011
Convert distance matrix to phylip format
I wrote a function to create a Eucliean distance matrix of some amino acid substitution matrices and I wanted to find a built-in method find the Spearman's rank of two lists to create a distance matrix that way. I found that BioPython actually has a method that builds distance matrices using various different distance metric, including Euclidean and Spearman's rank:
import Bio.Cluster dm = Bio.Cluster.distancematrix(data, dist="s")
If you change the dist to "e", then it will calculate the Euclidean distance.
I thought there might be a way to output this in phylip format so I could use quicktree, but if there is, I wasn't able to find it. So here's mine:
fout = open(filename, 'w') fout.write('%d\n' % len(names)) for name, row in zip(names, dm): fout.write(name) for value in row: fout.write('\t%s' % value) fout.write('\n')
It assumes you have the distance matrix in the format created by the Bio.Cluster distancematrix function, and have a list of names for the sequences or matrices.
An example output would be:
3 A B 1.2 0.8 C 3.2 1.6 2.0
The first value is the number of sequences in the distance matrix and the following lines are the lower triangle of a distance matrix, not including the diagonal (for which all the values would be 0).