Predicting pI

The pI (or isoelectric point) of a molecule is the pH at which it has neutral charge. The pI of a protein is determined by its amino acid composition so can be used to identify it (or at least narrow down its identity). Proteins can be separated on the basis of their pI on a polyacylamide gel by a technique called isoelectric focusing. I've written an introduction to pH, pI and pKa here.

The full code for my pI predecting function can be downloaded as a txt file from the bottom of this post.

pKa Values

First we need to know the pKa values of the individual amino acids and the N- and C-termini. I got the pKa values from Wikipedia, though there are various sources and they don't all agree with one another. It's not too important, since pI calculations are never that accurate. The pKa values only represent the association constants for the ions in an ideal solution and are highly dependent on the environment of the ion. In a protein, most amino acids will be surrounded by other amino acids, which will change the pKa values, though on average, the values should be roughly correct.

pKa = {'D': 3.9,
       'E': 4.3,
       'H': 6.1,
       'C': 8.3,
       'Y': 10.1,
       'K': 10.5,
       'R': 12,
       'N-terminus': 8,
       'C-terminus': 3.1}

We also need to know whether the ions are positively charged when protonated or negatively charged when deprotonated:

charges = {'D':-1, 'E':-1, 'H':1, 'C':-1, 'Y':-1, 'K':1, 'R':1, 'N-terminus':1, 'C-terminus':-1}

Calculating a protein's charge

Using the pKa and charge dictionaries we can create a function to find the charge of an amino acid at a given pH (derivation of the equation here).

def aminoAcidCharge(amino_acid, pH):
    ratio = 1 / (1 + 10**(pH - pKa[amino_acid]))
    
    if charges[amino_acid] == 1:
        return ratio
    else:
        return ratio - 1

We can then use this function to calculate the charge of an entire protein at a specific pH given its sequence (as a string):

def proteinCharge(sequence, pH):
    protein_charge = aminoAcidCharge('N-terminus', pH)
    protein_charge += aminoAcidCharge('C-terminus', pH)
    
    for aa in pKa.keys():
        protein_charge += sequence.count(aa) * aminoAcidCharge(aa, pH)

    return protein_charge

Calculating a protein's pI

We have a function that calculates the charge of a protein at a given pH, but there is no inverse function to calculate the pH at which a protein will have a given charge (such as 0). One way to calculate pI is to plot the charge of a protein over a range of pHs, connect the points, and read off the graph.

For example, below is a graph of the charge of one of the human keratin proteins over a range of pHs. You can see that its pI is around 5. The graph also illustrates how the charge of a protein is buffered over the range of pH5 - 9.

Plot of keratin's charge vs pH

To get a more exact value for pI we can use a binary search algorithm. This is an interative method that narrows in on the answer until you are as close to a pI of 0 as you wish.

We start by finding the charge of the protein at the midpoint of a range of pH values.  I use pH 3 - pH 13 since all amino acids will be >50% protonated below pH 3 and all will be >50% deprotonated above pH 13. If the charge at the midpoint (pH 8) is negative, then we know that the pI must be between pH 3 - pH 8, so we test the midpoint of that range. If the charge is positive, then we test the range pH 8 - pH 13. We keep testing the midpoint of the ranges in which the pI must lie until we are as close to a charge of 0 as we want.

def isoelectricPoint(sequence, threshold):   
    min_pH, max_pH = 3, 13 

    while True:
        mid_pH = 0.5 * (max_pH + min_pH)
        protein_charge = proteinCharge(sequence, mid_pH)
        
        if protein_charge > threshold:
            min_pH = mid_pH
        elif protein_charge < -threshold:
            max_pH = mid_pH
        else:
            return mid_pH

The function above the implements the binary search for a given protein sequence and threshold. It predicts that the pI of human keratin is 5.3 (with a threshold of 0.5). There is no point attempting to get a more precise pI than this as the pKa values are only given to one decimal place, and will not be so accurate for amino acids within a protein anyway.

AttachmentSize
predictPI.txt1.04 KB