Codon table

8 Dec 2010

Here's a quick code snippet to generate a codon table in Python. The 'table' is actually a dictionary that maps a three-letter, lowercase codon to a single uppercase letter corresponding to the encoded amino acid (or '*' if it's a stop codon).

bases = "tcag"
codons = [a + b + c for a in bases for b in bases for c in bases]
codon_table = dict(zip(codons, amino_acids))

So if you type codon_table['atg'], you'll get "M" for methionine. If you prefer to use 'u' rather than 't', simply change the base in the first line.

It's now quite easy to make a function to translate a gene into an amino acid sequence.

def translate(seq):
    seq = seq.lower().replace('\n', '').replace(' ', '')
    peptide = ''
    for i in xrange(0, len(seq), 3):
        codon = seq[i: i+3]
        amino_acid = codon_table.get(codon, '*')
        if amino_acid != '*':
            peptide += amino_acid
    return peptide

This function takes a DNA sequence, converts it to lowercase and removes any line breaks or spaces. Then it loops through it in chunks of 3, i.e. codons, translating them until it hits a stop codon or a codon not in the dictionary. It returns the amino acid sequence of the resulting peptide.

Comments (4)

Jay on 24 Sep 2011, 2:36 a.m.

Hi. Could you explain how the first part works to generate the codon table? It seems very useful and succinct but I just can't get my head round what's happening! How do the amino acids come to correspond with their respective codons in the dictionary? I understand zip works like this:

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)

Therefore I think it is probably this line:

codons = [a+b+c for a in bases for b in bases for c in bases]

That I don't understand


Peter on 24 Sep 2011, 3:04 p.m.

The line codons = [a+b+c for a in bases for b in bases for c in bases] is indeed the key line. It is a list comprehension, which I describe at:

Basically it is the equivalent of writing:

codons = []
for a in bases:
for b in bases:
for c in bases:

Higa on 11 Dec 2013, 10:46 p.m.

This is a very simple and useful excerpt of code.

I was studing biology and wanted to make sure I understood RNA translation by making a program to compute it. Your code avoided the boring part (I was going to write all codons, one by one).

Thanks for sharing it.

Hilong on 6 Jul 2015, 5:29 p.m.

Another trick:

import itertools
codons = itertools.product('tcag', 'tcag', 'tcag')