Thursday, 1st October 2009
The mathematics of gel electrophoresis
I have spent some time recently coming up with an equation that can model the movement of DNA through an agarose gel. I've written a brief introduction of how I think agarose gel electrophoresis works, which will help explain my model for the simulation.
An agarose gel typically contains 0.7% – 2% agarose (a polymer of sugars). The agarose is dissolved in a heated buffered solution (typically TAE). As the solution cools, the agarose forms cross-links with itself, creating a networked structure like a sponge (which is why it forms a gel). DNA is added to slits at one end of the gel called lanes. Then an electric current is applied, which causes the negatively-charged DNA to move towards the positive electrode.
As the DNA moves through the gel, it is impeded by the agarose cross-links. The higher the concentration of agarose, the smaller the holes in the gel and so the more DNA is impeded. The shorter the DNA, the less likely it is to be impeded, which is why agarose gel electrophoresis can separate DNA of different lengths. Higher concentrations of agarose (say 2% agarose) are better for resolving shorter lengths of DNA, while lower concentrations are better for resolving longer lengths of DNA.
We can assume that DNA is spherical with a diameter proportional to its length (assuming things are spherical seems to be the basis of various mathematical models, though in the case of DNA, it seems a particularly bad assumption, but I’ll explain my reasons later). Then we can view its passage through the gel as multiple encounters with holes of different sizes. If the sphere of DNA is smaller than the hole then it moves forward, otherwise it is impeded. I’m not quite sure what happens in real gels, but I think the idea is that the longer the DNA is, the more likely smaller holes are to impede its flow, perhaps because the DNA is more likely to drag along the sides of the holes.
I decided to describe the distribution of hole sizes in a gel with the gamma distribution.
There was no particular statistical reason for choosing the gamma distribution (though maybe there is a good statistical reason for it working). Instead I chose because it ranges from 0 to infinity (as opposed to, say, minus infinity to infinity in the case of the normal distribution) so will cover the range of possible lengths of DNA. In this model, the probability of a strand of DNA with length, x, moving through a random hole is the probability that the hole is smaller than the DNA (which I’m imagining is spherical), which corresponds to 1 minus the value for the cumulative distribution. This value of 1 minus the cumulative distribution value can also be considered the percentage of the maximum possible distance that the DNA is expected to travel. The maximum possible value will correspond to the gel front and should be proportional to the time the gel is run multiplied by the voltage at which the gel is run.
All that remains is to pick some values for the other parameters of the gamma function. I chose shape parameter (k) value of 2 because that seemed to give a nicely shaped graph and made the calculations easier. The scaling parameter (θ) determines the where the peak of the distribution is, so describes the mean size of holes. This parameter should therefore be a function of the concentration of agarose. I figured that since the size will depend on a molecule of agarose cross-linking with another in three dimensional space, θ should be proportional to 1/(concentration of agarose)³.
Using this system and some suitable constants, I got a motility function that is shown in the graph below. The graph shows how smaller DNA moves faster (with infinitely small strands moving at the maximum rate). It also shows how in a 0.8 % agarose gel, very small lengths of DNA will move at about the same rate (since the curve is nearly flat), so be poorly resolved. By contrast, in a 1.2 % agarose gel, longer lengths of DNA (7 Kb+) will be poorly resolved. DNA of lengths ~0.5 Kb – ~3 Kb will be resolved better since the curve is steeper, so a small difference in length will result in a large difference in distance travelled.
Interestingly, another way to ask what length of DNA a given gel can best resolve is to ask what lengths of DNA are most differentiated. We can answer this by mathematical differentiation (which is where the word comes from). The graph below is a differentiation of the first graph, which is also the non-cumulative gamma function and can be considered the graph of hole size in different gels. The graph shows that a 1.2 % gel differentiates DNA of length ~1 Kb best, whereas a 0.8 % gel generally differentiates lengths of DNA less well, though its peak is ~4 Kb.
Implementing this equation creates a pleasingly realistic result. I added in some unrealistic, but functional diffusion to soften the edges and make the image look that bit more realistic. The first two gels were loaded with a solution of containing lengths of DNA 0.5 – 10 Kb at 0.5 Kb intervals. DNA with lengths a multiple of 2 Kb are at a higher concentration to make measuring easier. The third and fourth gels loaded with a solution of containing lengths of DNA 0.2 – 4 Kb at 0.2 Kb intervals, with multiples of 1 Kb highlighted.
The virtual gels show that, as expected (and as in real life, give or take):
- A 0.8 % agarose gel gives a good resolution of DNA between 2 – 4 Kb.
- A 1.0 % agarose gel gives a good resolution of DNA between 1 – 2 Kb, while DNA >4 Kb is poorly resolved and bunches up at the top of the gel in quite a realistic way.
- A 1.2 % agarose gel gives a good resolution of DNA 0.2 – 1 Kb, while DNA >2 Kb is poorly resolved.
In the simulation, the gels were run for 55 minutes and at 20 volts, though both are arbitrary. By tweaking various parameters and constants I think this model could accurately mimic real agarose gel electrophoresis results.