Friday, 26th March 2010
Analysing evolution I
After setting up the simulation as described in the previous update, I left my cells to evolve for a couple of days, during which time 2800 generations passed. Each generation consisted of 20 000 units of time and the fitness of each cell was defined as the amount of chemical EH it could accumulate in this time. For each generation, I recorded the genome of the fittest cell and the amount of EH it had accumulated.
In the first generation, the top organism accumulated 6.2 units of EH; the highest amount of EH accumulated was 71.4 units in generation 263. Below shows the amount of EH in the top cell of the first 300 generations.
The graph demonstrates an important difference between this simulation and my heterocyst simulation or my genetic algorithm - despite the cells with the highest fitness being selected each generation, the fitness does not go up in every generation. In fact, beyond generation 263, the maximum amount of EH shows no net increase, but bounces up and down between ~55 and ~65 units.
The reason is that the environment changes every generation. While the amount of each chemical in the pool at the beginning of each generation was the same, it was altered over time by the cells' metabolism. The concept of cell fitness therefore becomes dependent one context; what is advantageous one generation may not be successful in the next. This also means that cells can therefore interact with one another, albeit indirectly. This will create opportunities for evolutionary arms-races and co-evolution.
It may also be possible to have two populations of cells, one using say metabolite X, another using metabolite Y. If cells metabolising X start to dominate, then the amount of X in the pool will reduce, allowing cells metabolising Y to prosper, and the situation to reverse. However, I suspect that a population of 64 cells is too small for two separate populations to be stable. Sadly, I only recorded the genome of the top organism in this run of evolution so I can't analyse populations yet.
The ancestral cell in my simulation had six genes, which encoded five different proteins. Their functions are described in the previous update.
- 1 E pore
- 1 H pore
- 2 EL pores (I think I duplicated this by mistake)
- 1 EL hydrolase
- 1 L-driven EH synthase
By the end of the 2800 generations, the fittest cell had 18 genes, which encoded four different proteins:
- 3 EL pores
- 3 EL hydrolases
- 5 L/H antiporters
- 7 L-driven EH synthases
Since the total amount of protein in all cells was fixed at 16 units, increasing the number of a particular gene increases the amount of protein encoded by that gene, but decreases the amount of protein encoded by all other genes. The ancestral cell therefore actually has the same amount of EL pore as the final cell (16/6 units).
Below is a graph showing the number of genes in the fittest cell and the amount of EH accumulated by that cell. The graph shows that the big jump around generation 250, in fitness is actually associated with a reduction in gene number (and a couple gene gains). Then between generation ~300 to generation ~2300, there is no change in gene number of the fittest organism. Interestingly, the gain in genes around generation 2300, appears to be associated with a reduction in the fitness of the cell.
While I didn't record the genomes of other cells other than the fittest in the population, the peaks in cell number suggest that there were two competing species the last 100 generations; the genome of the top organism flips between these two variations. One genome is the final genome shown above, the other contained 4 EL pores, 4 EL hydrolases, 8 L/H antiporters and 9 L-driven EH synthases. In an isolated pool, this cell is only slightly less efficient than the final cell. I would predict that under conditions of low H, the increased amount of L/H antiporters in this cell gives it an edge.
As well as changes in gene copy number, two proteins are lost and a novel protein appears over the course of evolution. As I mentioned in the previous update, the E pore is effectively useless and was very quickly lost (in just generation 7, resulting in quite a big jump in fitness). The other big change is that rather than an H pore to allow chemical H to passively diffuse into the cell, the final cell uses an L/H antiporter, making use of the concentration gradient of L already present to drive H into the cell. The loss of the H-pore occurred (in generation 225) and the gain of an L/H antiporter (in generation 255) are resulted in the two the big jumps in fitness shown in the graph above.
I was a bit surprised that the cell evolved to drive H into the cell, especially at the expense of its L gradient, which it needs to drive EH synthesis. However, I later realised that if a cell actively takes up H, then it will lower the concentration of H in the pool, causing H to flow out of competing cells that don't actively transport it. If the final evolved cell is put in a pool with the ancestral cell then the ancestral cell effectively die; it is unable to synthesise EH because H flows out of it.
Even in this simple simulation cells evolved aggressive tactics which I hadn't thought possible.
I think this explains the reduction in the maximum fitness around generation 2300; the reduction is associated with an increased in the number of L/H antiporters (and some other genes, which I guess are required to maintain metabolite balance). I suspect that all the cells in later generations are using their L gradient to take up more H, which results in no net increase in H uptake. When I ran evolution again for 1250 generation, again, the L/H antiporter was the major evolutionary breakthrough.
Running this simulation has given me lots of ideas for changes:
- The major evolutionary changes affected the amount of protein that cells expressed rather than changing which protein were expressed. The next change to my program will be to add transcription factors that allow expression to be altered more realistically. This will also require cells to use up a metabolite in protein synthesis to realistically limit protein synthesis rates.
- Introducing a cost to expressing proteins should also limit the number of proteins that contain large chunks of useless sequence. I may also add a slight cost for every base in the genome (since the longer the genome, the more energy required to replicate it) to add a selection pressure for more efficient genomes.
- I was a bit disappointed that cells didn't evolve to use other metabolites, but in retrospect it's not that surprising as there is little advantage in using them. I could address this by causing the availability of metabolites to fluctuate or by requiring other chemicals to be synthesised.
- One possibility is to not reset the concentration of metabolites in the pool every generation. This would eventually cause everything to reach equilibrium, so I would need to add metabolites or energy, in the form of light say (and then allow organism to harness it). I think I should also make the concentration of metabolites in daughter equal to the concentration of metabolites in the parent at the end of the generation.
- I would like to see whether different species can coexist, which will probably require increasing the population size and reducing the selectivity, which is currently very stringent.