Tuesday, 25th January 2011
Finding the reverse complement
I wrote this small function to get the reverse complement of a DNA sequence. It's probably not the most efficient way to do it, but I like it because it's compact. It will work for upper- or lower-case sequences, but it will always return a lowercase sequence. It removes any characters it doesn't recognise, which is useful if you have line numbers in the sequence.
def reverseComplement(sequence):
complement = {'a':'t','c':'g','g':'c','t':'a','n':'n'}
return "".join([complement.get(nt.lower(), '') for nt in sequence[::-1]])
This function works by going through a string (or list) backwards and replacing each letter using a dictionary. Any character not in the dictionary, such as spaces, line breaks or numbers, are ignored so a single continuous string is returned.
Another option is to use the string.translate() function:
import string
complement = string.maketrans('atcgn', 'tagcn')
def reverseComplement(sequence):
return sequence.lower().translate(complement)[::-1]This function works by creating a translation table with string.maketrans() and using it to translate the sequence.
Post new comment