Population analysis I

I wanted to re-run my simulation with a couple of changes. First, I wanted to see whether different 'species' could co-exist, so I made the output show the genomes of all organism, rather than just the fittest organism's genome. I also made it record the concentration of metabolites in the solution at the end of each generation, so I can see which metabolites are used up.

Second, I wanted to encourage different species to evolve. For this, I doubled the population size to 128 cells and simplified the way the next generation. Now, the fittest half of a population produces two daughter cells for the next generation. The result of this is that mutations now take longer to spread through a population.

Metabolism update

To further encourage different species to evolve and in preparation of later updates, I created a more diverse metabolic and challenging environment. The first change to create two potential sources energy, metabolites FG and FK. These are provide energy because there in the form of a concentration gradient (they are more concentrated outside the cell than in) and because they release energy when broken down (the reaction is far from equilibrium). I imagine FG and FK are the equivalent of two sugars (such as glucose and fructose). Since they both contain chemical F, this is likely to become a waste product.

While I decided to keep EH as an equivalent to ATP (because its hydrolysis reaction is the most favourable), I decided to make IH the equivalent to DNA, and so changed the measure of fitness: now cells must maximise the amount of IH they accumulate. In the ancestral cell, IH formation is driven by EH hydrolysis. I also decided to make KG the equivalent to amino acids, though in this simulation, this still has no effect.

Metabolic network of the ancestral cell

Evolution

With twice the population size and about twice the number of genes, in the ancestral cell, this simulation was a bit slower to run, but after a few days, it still managed to reach 1920 generations. Below is a graph showing the fitness (defined as the concentration of IH a cell accumulated by the end of a generation) and the number of genes in the fittest organism. As in the previous simulation, the fitness increases for first 200 or so generations before levelling off. The number of genes, on the other hand, is a lot more variable that before and shows a general trend of increasing. I suspect this is because the more genes a cell has, the better it can fine tune the relative expression levels of genes.

Since I recorded the genome of each organism in this run of evolution (1920 generation each with 128 cells, so nearly 250 000 genomes), I can also plot the fitness of, for example, the 64th fittest organism in each generation, which gives a measure of the median fitness. As you can see, the graph is very similar in general, only the position of fluctuations is different.

Population Graph

Below is my first attempt at representing a population of cells. It shows all the cells from the final generation in order of fitness (left to right, top to bottom). Cell fitness is also represented by the area of each circle. Cell colour represents the similarity of cell proteomes. The cells with the most different proteomes are coloured pure blue and pure green; the greenness and blueness of all other cells is determined by their distance from each of these cells.

Population graph

I suspect the later generations are more likely to be homogeneous. Below shows the fittest 16 cells of the first 16 generations, when there is more evolutionary changes. In the first generation there is one cell that is much more successful than the others, which are so small as to be almost invisible. This mutation spreads through the population, so by generation 7 all the cells in the top 16 are at least as fit. The next big evolutionary breakthrough is at generation 14, which then begins to spread. In this graph the colours do seem to highlight different species of cell reasonably well (e.g. the dark blue species that takes hold in generation 13, but loses its top spot the very next generation).

Population graph

A quick look at the proteomes suggest that the advantageous mutation in generation 1 is for an enzyme that catalyses the reaction EH + ILEL + IH. This reaction is a more efficient way of using the EH gradient to drive IH production and uses the small amout on IL in the cell. The improved fitness of a cell in generation 14 is likely to be due to a mutation that creates a G/IL antiporter, which allows the cell to take up IL using the G gradient.