Cell Simulation

I think the reason I'm interested in artificial evolution is that I want to create a system that evolves while I record every stage of evolution and trace the changes. However, I've found that even with a complete record of every genome in a system, it can be quite a challenge to untangle evolutionary pathways. I hope that in trying to create programs to analyse the information from my artificial evolution, I might create something that can be used to analyse evolution in the real world.

Coding

I've started new cell simulations several times now and got quite far each time before hitting a wall. I've decided to start yet again, building up from the simplest building blocks and recording my progress on this blog (mainly so I can remember how everything works). I'll also be using Git and putting my code for this project on Gihub for all to see and edit.

To keep my program flexible and simple to use, I've tried to write a program module that contains only very simple, relatively self-explanatory commands; these commands call functions in imported modules where all the computations are hidden (essentially an API). For the most part, I'll explain only the commands in the top-level module; if you want to see more detail then you can look at the code on Github.

Aims

The overall aim of this simulation is to evolve networks: networks of reactions (i.e. metabolic pathways) and networks of transcription factors and repressors (i.e. regulatory pathways). In biology we look at these as complete systems, so I think it will be interesting to see how such complex interlocking networks can arise bit-by-bit through evolution. I also hope this simulation will be a more open simulation. Below are some of the hopes I have for evolution in my simulation and how they might be encouraged.

Natural selection

In contrast to my previous evolution simulations, I want this one to simulate evolution by natural selection. In all my previous simulations of evolution there has been a specific measue of fitness (a fitness function). For example, in my simulation of heterocysts, cells were selected based on the time it took them to produce a filament of a specific, arbitrary length. This simply led to the evolution of cells that could divide fastest, even if cells used up all their resources by the end of the simulation; once they'd reached the target length it didn't matter how healthy they were

In this simulation the cells that reproduce will reproduce and those that don't, won't. Evolution should therefore be more open ended and allow multiple strategies to be explored. For example, some cells may evolve to reproduce quickly, while others may evolve to divide more slower to ensure their daughter cells have larger reserves of energy.

Interaction and competition

This is really part of natural selection. Whereas my previous simulations have simulated individuals in isolation and measured their fitness at the end of a 'generation', in this simulation I want to simulate organisms simultaneously. This means that cells will be able to interact with one another directly or indirectly by influencing the environment (see below). I hope this will drive the evolution of ecosystems in which organisms become dependent upon each other's existence.

Variable environment

To further encourage diverse evolutionary strategies, and thus speciation, I aim to simulate an environment that is homogeneous both spatially and temporally. Spatial difference will allow cells to adapt to different niches, while temporal changes will force cells the respond to the environment. In fact, one of the main aims of the simulation is to evolve regulatory networks, and in order for a regulatory network to evolve, there must be changes in the environment to respond to.

The simulation of the pool of water in which the cells will live is effectively a separate project, which I am writing about here.

Unconstrained genetics

Whilst all genetics must be constrained to some degree (by which I mean not all phenotypes are possible; at the very least the laws of physics provide some limits), my previous simulations have been very limited. In all cases genes have represented relatively simple and well-defined characteristics, and there has been a set number of genes. in this simulation, I hope to create a sufficiently realistic and open-ended system by creating a function that converts a sequence of amino acids into proteins (i.e. cellular functions). I suspect that getting this mapping correct with be the most important part of building the simulation. Furthermore, there will be no limit to how long a genome can be other than my computer's memory and how much energy it takes for an organism to synthesise the genome.

Virtual chemistry

Evolution can create a huge diversity of organisms, but they are all constrained by physics and chemistry. Organisms can never evolve to synthesise novel elements, to generate energy from nothing or to travel faster than light. One of the (few) things that disappointed me about the game Creatures, was that because of the way that enzymes were defined, one of my creatures evolved to synthesise starch from nothing, so never had to eat.

In my simulation, I want cells to be constrained by a realistic chemistry. However, I don't intend to attempt to simulation real-life chemistry; that would be too computationally complex and not necessarily useful for a simulation of evolution. Instead I intend to invent my own simple chemistry, defining all the chemicals that can exist and all the reactions that can inter-convert between them.

With chemistry defined, organisms will have to evolve ways to exploit it to their own ends. I want to avoid defining specific chemicals as 'food' or sugars and instead allow organisms to metabolise whatever suits them. For example, if a large quantity of a waste product is produced by one organism, another might evolve to exploit that concentration gradient.

Elements

In my virtual universe there are eight elements, which I've given unimaginative one-letter names [N.B. I use different letters later in my blog and am in the process of updating everything] and arranged in a simple 4 by 2 periodic table.

The elements could represent atomic elements, but they could equally represent functional groups, such as an organic acid group, it's not really important. I arranged them in a table to indicate that elements that in the same column are chemically similar. The properties of elements, such as their stability is also related to their position in the table.

Compounds

As in our universe, elements can combine to form compounds, but not all combinations of chemicals are possible. In my universe, there are 12 two-element compounds formed by either combining elements from coloumn 1 with elements from column 4 or by combining elements from coloumn 2 with elements from column 2 or 3. The relative stability of compounds is a function of the properties of the elements.

Solutions

Since this is a biological simulation, all reactions will occur in water. Water is not defined in the same way as other chemicals in the simulation, but instead a Solution object is defined. This object has a volume property, which defines how many units of water it consists of, and a dictionary of the chemicals it contains. For example, we create a pool of 64 000 units of water like so:

from solutions import Solution
pool = Solution(volume=64000)

We can then add 1000 units of chemical AD:

pool.addChemicals({'AD': 1000.0})

The Cell object inherits from the Solution object because a cell is essentially just a bag of water containing some chemicals (plus DNA and proteins, which are defined separately in this simulation).

Reactions

Having defined all the possible chemicals in the universe (elements + compounds), I defined all the ways to convert one into another. There are three classes of reaction and they are all reversible.

1. Hydrolysis / Synthesis reactions

In these reactions a compound is split into its constituent elements (or two element combine to form a compound). For example, ADA + D

2. Transferase reactions

In these reactions an element reacts with a compound and replaces one of the constituent elements of that compound. For example, AD + HAH + D

3. Mutual transferase reactions

In these reactions two compounds reacts and swap constituent elements. For example, AD + EHAH + ED

Below is a diagram showing the four possible reactions that compound AD can undergo.

The reactions of chemical AD.

Rate constants

Each reaction (forward and backward) has a rate constant which is calculated based on the relatively stability of the chemicals involved. The rate constant determines how fast a reaction occurs and the ratio of the rate constants for forward and backward reactions implicitly determines where the equilibrium of a reaction lies. I'll explain more how this works in the section about enzymes since I've decided to only allow reactions to occur if an enzyme that catalyses it is present (i.e. no reaction occurs spontaneously).

Enzyme kinetics

As I mentioned in the section about reactions, in order for any reaction to occur in my simulation, an enzyme catalysing it is required. This potentially reduces the possibilities for interesting chemistry to evolve, but simplifies computation. Enzymes are added to Solution objects (including Cell objects) with the addProtein() function. For example, continuing from the code in the previous post:

pool.addProtein(amount=2.0, sequence='MLMMMM')

Here we've added two units of a protein with the amino acid sequence 'MLMMMM' to the pool. The 15 amino acids in my simulation are represented by the letters L to Z. The amino acid sequence 'MLMMMM' codes for a type of enzyme I call ADases because they catalyse the hydrolysis/synthesis of chemical AD. to/from chemicals A and D. In a later post I'll explain exactly how amino acid sequences define protein functions.

ADase

Once we have set up a pool of water, we can run the simulation for multiple ticks like so:

pool.update(ticks=1000)

The graph below shows how the concentrations change over 1000 ticks in a solution of 1% AD, 0.5% A and 0% D when 0.01% ADase is added.

The net reaction is that AD is hydrolysed until equilibrium is reached. Equilibrium is reached when the rate of AD hydrolysis occurs at the same rate of AD synthesis. In this case, equilibrium is reach when approximately 97% of the AD has been broken down. The relative rates of the forward and reverse reactions depends on the concentrations of the chemicals involved and two rate constants. These rate constants (k1 and k2) are defined by the relative stabilities of the chemicals on either side of the equation.

The reactions of chemical AD.

The hydrolysis of AD into A and D has a rate constant (k1) of 0.6. If the reaction were to occur spontaneously, this would result in 60% of AD in the pool breaking down in each tick of the simulation. However, since this reaction requires an enzyme to catalyse it, the rate is multiplied by the amount of AD bound the enzyme. I've defined the amount of AD bound to the enzyme at any given moment as:

The reactions of chemical AD.

Where [AD] is the concentration of AD, i.e. the amount of AD divided by the volume of the solution. The km is a constant associated with each enzyme, which determines its affinty for its substrates. The amount of AD bound is actually slightly less in the simulation as AD competes for the enzyme binding site with A and D.

The reverse reaction is more complicated as it depends on two substrates. The equation is essentially the same, but is limited by whichever substrate is the least bound to the enzyme. This is a bit of a cheat, but simplifies things and stops reaction rates becoming vanishingly small when many substrates are involved.

The effect of km

The km of an enzyme in my simulation, and in reality, determines how the concentration of a substrate effects the reaction rate. From the equation above, you can see that when the concentraion of a substrate is equal to the km, half the enzyme will be bound to the substrate. A small km therefore means that an enzyme binds a substrate tightly and saturates at a relatively low concentration. A large km means that an enzyme binds the substrate less tightly and saturates at a higher concentration. Enzymes with a low km will catalyse reactions faster but will be less responsive to changes in susbtrate concentration.

The graph below shows how substrate concentration effects the reaction rate of four ADase enzymes with different kms. As the susbtrate concentration increases, the reaction rate asymptotically reaches the maximum rate of 0.6. Only the initial reaction rate is measured so there is no reverse reaction or product inhibition.

Comparison of reaction rates

As I mentioned above, rate constants depend on the relative stability of the chemicals involved in a reaction. AD is relatively unstable so equilibrium is reached when most of it (~97%) has been broken down. On the other hand, chemical FG is relatively stable, so breaks down slowly and reaches equilibrium at a higher concentration (only ~20% broken down). EH is somewhere in between.

Getting work done

Previously I described how enzymes can catalyse a reaction. If this were all enzymes could do then cells would be very limited - they could only ever move subsets of reactions towards equilibrium. In fact enzymes are limited to moving reactions to equilibrium, however, they can expand the repertoire of reactions available by coupling two or more together, effectively, creating a new reaction. For example, real life cells couple the synthesis of DNA to the hydrolysis of GTP. The addition of a nucleotide to a strand of DNA would normally be energetically unfavourable (the equilibrium favours removing a nucleotide), but the hydrolysis of GTP is more favourable. The coupled reaction therefore favours DNA synthesis.

In order for the simulated cells to drive energetically unfavourable reactions, such as DNA synthesis, they therefore require enzymes that can couple reactions. For example, in a solution of 1% EH and 0.5% E and H, EH synthesis is unfavourable. Similarly, in a solution of 1% AD and 05% A and D, AD synthesis is unfavourable. Adding ADase and EHase to the pool causes both EH and AD to become hydrolysed.

(The concentrations of D and H are equal to the concentrations of A and E respectively, so are not shown.)

Pores and transporters

Because transporters and enzymes work in the same way in this simulation, transporters can also be coupled, which means several types of protein can now be defined:

  • tra-f-A: a pore that allows the passive diffusion of A across the membrane.
  • tra-f-A-tra-f-B: an A/B symporter
  • tra-f-A-tra-r-B: an A/B antiporter
  • tra-f-A-tra-f-CDase: an active transporter of A, powered by CD hydrolysis

In fact, if the balance of equilibria is right, the final protein can also act like ATP synthase, using the concentration gradient of A to synthesise CD.

Finding chemistry that works

Biology is the search for the chemistry works -- R. J. P. Williams.

I quickly got bored of having just three chemicals A, B and AB, so have defined a new chemistry, consisting of:

  • Eight elements (imaginatively called E, F, G, H, I, J, K and L)
  • Eight compounds (EH, EL, FG, FK, IH, IL, JG and JK)
  • Eight synthesis/hydrolysis reaction (one for each compound)
  • Two inter-conversion reactions (e.g. EH + IL ⇌ EL + IH)

I defined the compounds and reactions using a few simple rules, which can be considered the physics of the simulation. These rules also defined the stability of compounds, which I used to determine the equilibria of the reactions and the concentration of elements in the universe (the default concentration in the pool and cells).

I gave myself the aim of synthesising compound EH, the most energetically unfavourable reaction. I added some EL and FK into the pool, which cells could use to power synthesis reaction (using the concentration gradient of these compound as well as the chemical energy released in their hydrolysis). I imagine that EH is equivalent to ATP or DNA, which cells must synthesise in order to replicate; and EL and FK are equivalent to sugars that cells metabolise to power reactions.

The system that I created is shown below. Black arrows indicate the transport (active or passive) of chemicals into and out of the cell; red arrows indicate reactions.

Metabolism in an ancestral cell

This metabolism makes use of the concentration gradient and chemical energy of EL to drive EH formation. EL enters the cell by passive diffusion and is broken down to E and L. E can exit the cell, while L accumulates. L accumulates to a high concentration, so that it can drive EH formation as it leaves the cell. I later realised that the E pore is unnecessary, but I left it in the ancestral cell to see if evolution also realised it. In the future, I'll probably just start with a random genome.

Virtual genetics

In order get cells to evolve I needed to create a way to mutate the proteins in the cell. I did this by giving cells a genome which defined the proteins it contained. The genome is a linear sequence of nucleotides, which I represent with the letters A, B, C and D (equivalent to A, G, C and T in real cells). This sequence can be easily mutated as described below.

The genome is split into genes with the boundaries between genes indicated by the sequence DDAA. Genes are then split into two letter codons, which are interpreted thus:

  • BA-xx - a forward transporter of X
  • BB-xx - a reverse transporter of X
  • BC-xx - an enzyme for reaction X (forward)
  • BD-xx - an enzyme for reaction X (reverse)

The two letter codon xx defines a chemical, X, in the case of transporters and a reaction, X, in the case of enzymes. For example BA-AA encodes an E-transporter and BC-AA encodes an EH synthase/hydrolase. The forward and reverse direction of reactions is not important unless reactions are coupled. For example:

  • BA-AA-BA-AB is an E/F symporter (same direction)
  • BA-AA-BB-AB is an E/F antiporter (opposite direction)
  • BB-AA-BA-AB is an E/F antiporter (opposite direction)
  • BB-AA-BB-AB is an E/F symporter (same direction)

If a codon doesn't (yet) have an interpretation, it is ignored, so CA-BA-AA-CB-CC-BA-AB-CD is an E/F symporter as above. Later on I'd like to create a more efficient system whereby the arrangement of two binding sites determines how reactions are coupled.

Mutations

Each time a cell is created, its DNA is synthesised using a template. Bases are copied from the template one at a time with a 0.5% chance of mutation per base. 75% of mutation are single base changes while 25% cause the DNA to continue replicating from a random position on the template, thus creating insertions and deletions and potentially duplicating the entire genome.

Simulation set-up

I started a simulation with a pool with a volume of 1 000 000 units and the same concentration of chemicals as above (default concentrations of elements plus extra EL and FK). I then added 64 cells to the pool, each containing the default concentration of elements. The DNA in each cell was based on the genome of my ancestral cell, mutated as described above. The genomes were interpreted and an amount of each protein was added to cells, such that the total amount of protein in each cell was 16 units. This prevented cells simply producing more proteins by duplicating genes. It also meant that cells producing useless proteins were penalised by reducing the amount of other proteins.

Selection

Cells were run for 20 000 time units and ranked by the concentration of EH they accumulated during this time. The top 16 cells were selected and allowed to replicate. The selection method allowed more successful cells to have more offspring in a fairly arbitrary manner:

  • Cell 1 produces twelve daughters
  • Cells 2 - 4 produce eight daughters
  • Cells 5 - 8 produce four daughters
  • Cells 9 - 16 produce one daughter

In addition, four "wildcard" cells were picked at random and allowed to produce a daughter. This results in 64 daughter cells to form the next generation. Each generation was added to a fresh pool with the same distribution of chemicals as before, and run for 20 000 times units again.

Analysing evolution I

After setting up the simulation as described in the previous update, I left my cells to evolve for a couple of days, during which time 2800 generations passed. Each generation consisted of 20 000 units of time and the fitness of each cell was defined as the amount of chemical EH it could accumulate in this time. For each generation, I recorded the genome of the fittest cell and the amount of EH it had accumulated.

In the first generation, the top organism accumulated 6.2 units of EH; the highest amount of EH accumulated was 71.4 units in generation 263. Below shows the amount of EH in the top cell of the first 300 generations.

Cell interactions

The graph demonstrates an important difference between this simulation and my heterocyst simulation or my genetic algorithm - despite the cells with the highest fitness being selected each generation, the fitness does not go up in every generation. In fact, beyond generation 263, the maximum amount of EH shows no net increase, but bounces up and down between ~55 and ~65 units.

The reason is that the environment changes every generation. While the amount of each chemical in the pool at the beginning of each generation was the same, it was altered over time by the cells' metabolism. The concept of cell fitness therefore becomes dependent one context; what is advantageous one generation may not be successful in the next. This also means that cells can therefore interact with one another, albeit indirectly. This will create opportunities for evolutionary arms-races and co-evolution.

It may also be possible to have two populations of cells, one using say metabolite X, another using metabolite Y. If cells metabolising X start to dominate, then the amount of X in the pool will reduce, allowing cells metabolising Y to prosper, and the situation to reverse. However, I suspect that a population of 64 cells is too small for two separate populations to be stable. Sadly, I only recorded the genome of the top organism in this run of evolution so I can't analyse populations yet.

Genetic analysis

The ancestral cell in my simulation had six genes, which encoded five different proteins. Their functions are described in the previous update.

  • 1 E pore
  • 1 H pore
  • 2 EL pores (I think I duplicated this by mistake)
  • 1 EL hydrolase
  • 1 L-driven EH synthase

By the end of the 2800 generations, the fittest cell had 18 genes, which encoded four different proteins:

  • 3 EL pores
  • 3 EL hydrolases
  • 5 L/H antiporters
  • 7 L-driven EH synthases

Since the total amount of protein in all cells was fixed at 16 units, increasing the number of a particular gene increases the amount of protein encoded by that gene, but decreases the amount of protein encoded by all other genes. The ancestral cell therefore actually has the same amount of EL pore as the final cell (16/6 units).

Below is a graph showing the number of genes in the fittest cell and the amount of EH accumulated by that cell. The graph shows that the big jump around generation 250, in fitness is actually associated with a reduction in gene number (and a couple gene gains). Then between generation ~300 to generation ~2300, there is no change in gene number of the fittest organism. Interestingly, the gain in genes around generation 2300, appears to be associated with a reduction in the fitness of the cell.

While I didn't record the genomes of other cells other than the fittest in the population, the peaks in cell number suggest that there were two competing species the last 100 generations; the genome of the top organism flips between these two variations. One genome is the final genome shown above, the other contained 4 EL pores, 4 EL hydrolases, 8 L/H antiporters and 9 L-driven EH synthases. In an isolated pool, this cell is only slightly less efficient than the final cell. I would predict that under conditions of low H, the increased amount of L/H antiporters in this cell gives it an edge.

Proteomic analysis

As well as changes in gene copy number, two proteins are lost and a novel protein appears over the course of evolution. As I mentioned in the previous update, the E pore is effectively useless and was very quickly lost (in just generation 7, resulting in quite a big jump in fitness). The other big change is that rather than an H pore to allow chemical H to passively diffuse into the cell, the final cell uses an L/H antiporter, making use of the concentration gradient of L already present to drive H into the cell. The loss of the H-pore occurred (in generation 225) and the gain of an L/H antiporter (in generation 255) are resulted in the two the big jumps in fitness shown in the graph above. 

I was a bit surprised that the cell evolved to drive H into the cell, especially at the expense of its L gradient, which it needs to drive EH synthesis. However, I later realised that if a cell actively takes up H, then it will lower the concentration of H in the pool, causing H to flow out of competing cells that don't actively transport it. If the final evolved cell is put in a pool with the ancestral cell then the ancestral cell effectively die; it is unable to synthesise EH because H flows out of it.

Even in this simple simulation cells evolved aggressive tactics which I hadn't thought possible.

I think this explains the reduction in the maximum fitness around generation 2300; the reduction is associated with an increased in the number of L/H antiporters (and some other genes, which I guess are required to maintain metabolite balance). I suspect that all the cells in later generations are using their L gradient to take up more H, which results in no net increase in H uptake. When I ran evolution again for 1250 generation, again, the L/H antiporter was the major evolutionary breakthrough.

Future adaptations

Running this simulation has given me lots of ideas for changes:

  • The major evolutionary changes affected the amount of protein that cells expressed rather than changing which protein were expressed. The next change to my program will be to add transcription factors that allow expression to be altered more realistically. This will also require cells to use up a metabolite in protein synthesis to realistically limit protein synthesis rates.
  • Introducing a cost to expressing proteins should also limit the number of proteins that contain large chunks of useless sequence. I may also add a slight cost for every base in the genome (since the longer the genome, the more energy required to replicate it) to add a selection pressure for more efficient genomes.
  • I was a bit disappointed that cells didn't evolve to use other metabolites, but in retrospect it's not that surprising as there is little advantage in using them. I could address this by causing the availability of metabolites to fluctuate or by requiring other chemicals to be synthesised.
  • One possibility is to not reset the concentration of metabolites in the pool every generation. This would eventually cause everything to reach equilibrium, so I would need to add metabolites or energy, in the form of light say (and then allow organism to harness it). I think I should also make the concentration of metabolites in daughter equal to the concentration of metabolites in the parent at the end of the generation.
  • I would like to see whether different species can coexist, which will probably require increasing the population size and reducing the selectivity, which is currently very stringent.

Population analysis I

I wanted to re-run my simulation with a couple of changes. First, I wanted to see whether different 'species' could co-exist, so I made the output show the genomes of all organism, rather than just the fittest organism's genome. I also made it record the concentration of metabolites in the solution at the end of each generation, so I can see which metabolites are used up.

Second, I wanted to encourage different species to evolve. For this, I doubled the population size to 128 cells and simplified the way the next generation. Now, the fittest half of a population produces two daughter cells for the next generation. The result of this is that mutations now take longer to spread through a population.

Metabolism update

To further encourage different species to evolve and in preparation of later updates, I created a more diverse metabolic and challenging environment. The first change to create two potential sources energy, metabolites FG and FK. These are provide energy because there in the form of a concentration gradient (they are more concentrated outside the cell than in) and because they release energy when broken down (the reaction is far from equilibrium). I imagine FG and FK are the equivalent of two sugars (such as glucose and fructose). Since they both contain chemical F, this is likely to become a waste product.

While I decided to keep EH as an equivalent to ATP (because its hydrolysis reaction is the most favourable), I decided to make IH the equivalent to DNA, and so changed the measure of fitness: now cells must maximise the amount of IH they accumulate. In the ancestral cell, IH formation is driven by EH hydrolysis. I also decided to make KG the equivalent to amino acids, though in this simulation, this still has no effect.

Metabolic network of the ancestral cell

Evolution

With twice the population size and about twice the number of genes, in the ancestral cell, this simulation was a bit slower to run, but after a few days, it still managed to reach 1920 generations. Below is a graph showing the fitness (defined as the concentration of IH a cell accumulated by the end of a generation) and the number of genes in the fittest organism. As in the previous simulation, the fitness increases for first 200 or so generations before levelling off. The number of genes, on the other hand, is a lot more variable that before and shows a general trend of increasing. I suspect this is because the more genes a cell has, the better it can fine tune the relative expression levels of genes.

Since I recorded the genome of each organism in this run of evolution (1920 generation each with 128 cells, so nearly 250 000 genomes), I can also plot the fitness of, for example, the 64th fittest organism in each generation, which gives a measure of the median fitness. As you can see, the graph is very similar in general, only the position of fluctuations is different.

Population Graph

Below is my first attempt at representing a population of cells. It shows all the cells from the final generation in order of fitness (left to right, top to bottom). Cell fitness is also represented by the area of each circle. Cell colour represents the similarity of cell proteomes. The cells with the most different proteomes are coloured pure blue and pure green; the greenness and blueness of all other cells is determined by their distance from each of these cells.

Population graph

I suspect the later generations are more likely to be homogeneous. Below shows the fittest 16 cells of the first 16 generations, when there is more evolutionary changes. In the first generation there is one cell that is much more successful than the others, which are so small as to be almost invisible. This mutation spreads through the population, so by generation 7 all the cells in the top 16 are at least as fit. The next big evolutionary breakthrough is at generation 14, which then begins to spread. In this graph the colours do seem to highlight different species of cell reasonably well (e.g. the dark blue species that takes hold in generation 13, but loses its top spot the very next generation).

Population graph

A quick look at the proteomes suggest that the advantageous mutation in generation 1 is for an enzyme that catalyses the reaction EH + ILEL + IH. This reaction is a more efficient way of using the EH gradient to drive IH production and uses the small amout on IL in the cell. The improved fitness of a cell in generation 14 is likely to be due to a mutation that creates a G/IL antiporter, which allows the cell to take up IL using the G gradient.