[This article is very old now, and probably very out of date. Hopefully the principles are basically sound.]

I've recently got a multi-core laptop, so was keen to try some parallel processing using Python. It's pretty simple; you just need to use:

os.fork()

However, the difficult part is working out what happens after the fork, and working out how to build a program around it. The difficulty is that when the program reaches the `os.fork()` command, the program splits into two identical copies. But generally you don't want two copies a program doing exactly the same thing - you want two programs doing slightly different things. Even trying to create differences using random numbers is problematic.

Differentiating between processes

Naturally, there is a way to differentiate between the parent and child processes: when the `os.fork()` is called it returns 0 to the child process and id of the child process to the parent.

import os
pid = os.fork()
print pid

As a result, it's possible to make the parent and child processes do different things. For example, the following will write two different files with different outputs:

import os
pid = os.fork()

if pid == 0:
    fout = open('child.txt', 'w')
    fout.write('File created by child process %d' % pid)
else:
    fout = open('parent.txt', 'w')
    fout.write('File created by parent process %d' % pid)

fout.write('\nEnd of file')

Waiting for a child process

If you've created a child process, the chances are you want the parent to wait for it to finish doing whatever its doing before the parent continues. For this you need to use `os.waitpid(pid, 0)`. For example:

import os, time

def timeConsumingFunction():
    x = 1
    for n in xrange(10000000):
        x += 1

pid = os.fork()

if pid > 0:
    child = pid
else:
    timeConsumingFunction()
    os._exit(0)

t = time.time()
os.waitpid(child, 0)
print time.time() - start_time

Here, the parent process splits of a child which counts to ten million, while the parent waits. Once the child has finished calling the `timeConsumingFunction`, it exists with `os._exit(0)`. Note that `os._exit(0)` is used for child processes instead of `os.exit(0)`. The 0 indicates that the process has exited without errors. Once the child has finished, the parent prints the time it spent waiting for the child.

Multiple forks

To create multiple forks, we can use a loop. In this case, using `os._exit(0)` is vital to ensure that the child processes don't continue the loop, forking off even more children.

import os, time

NUM_PROCESSES = 7

def timeConsumingFunction():
    x = 1
    for n in xrange(10000000):
        x += 1

children = []

start_time = time.time()
for process in range(NUM_PROCESSES):
    pid = os.fork()
    if pid:
        children.append(pid)
    else:
        timeConsumingFunction()
        os._exit(0)

for i, child in enumerate(children):
    os.waitpid(child, 0)

print time.time() - start_time