Codon table

Here's a quick code snippet to generate a codon table in Python. The 'table' is actually a dictionary that takes a three-letter, lowercase codon as a key, and returns a single uppercase letter corresponding to the encoded amino acid (or '*' if it's a stop codon).

bases = ['t', 'c', 'a', 'g']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))

So if you type codon_table['atg'], you'll get M for methionine. If you prefer to use 'u' rather than 't', simply change the base in the first line.

It's now quite easy to make a function to translate a gene into an amino acid sequence.

def translate(seq):
    seq = seq.lower().replace('\n', '').replace(' ', '')
    peptide = ''
    
    for i in xrange(0, len(seq), 3):
        codon = seq[i: i+3]
        amino_acid = codon_table.get(codon, '*')
        if amino_acid != '*':
            peptide += amino_acid
        else:
            break
                
    return peptide

This function takes a DNA sequence, converts it to lowercase and removes any line breaks or spaces. Then it loops through it in chunks of 3, i.e. codons, translating them until it hits a stop codon or a codon not in the dictionary. It returns the amino acid sequence of the resulting peptide.

Comments

Hi. Could you explain how the first part works to generate the codon table? It seems very useful and succinct but I just can't get my head round what's happening! How do the amino acids come to correspond with their respective codons in the dictionary? I understand zip works like this:

 >>> x = [1, 2, 3]

>>> y = [4, 5, 6

>>> zipped = zip(x, y) 

Therefore I think it is probably this line:

codons = [a+b+c for a in bases for b in bases for c in bases]

That I don't understand

Thanks!

The line codons = [a+b+c for a in bases for b in bases for c in bases] is indeed the key line. It is a list comprehension, which I describe at: http://www.petercollingridge.co.uk/python-tricks/list-comprehensions

Basically it is the equivalent of writing:

codons = []
for a in bases:
    for b in bases:
        for c in bases:
            codon.append(a+b+c)

 

This is a very simple and useful excerpt of code.

I was studing biology and wanted to make sure I understood RNA translation by making a program to compute it.  Your code avoided the boring part (I was going to write all codons, one by one).

Thanks for sharing it.

Post new comment

The content of this field is kept private and will not be shown publicly.