Data Visualisation

With an increasing amount of data becoming freely available (from the government or the Guardian, for example), I wanted to do something useful with the information and experiment with different ways to visualise it. This isn't really a cohesive project, but it made sense to lump all my various efforts here.

Mining the Guardian Data Store

After going to Open Tech in July, I was inspired to attempt to do something useful with the large volumes of data that various organisations are making available on the web. I decided that the Guardian Data Store was as good a place as any to begin. The Guardian Data Store contains numerous spreadsheets (as Google documents), which contain the raw data relating to a story in the paper. You can then use this data to do with what you will. For example, many people have created their own visualisation (i.e. graphs) of the data and uploaded them to a specific group on Flickr.

I’m find searching for ways to display large quantities data in a clear and informative manner pretty interesting and have recently been honing my Adobe Illustrator skills as part of my thesis-writing. However, I thought it might be more fun to combine the data from various different stories to see if I could find something novel in the available information (then I could think of a pretty way to display it). Firstly, I needed to identify a common variable between a group of datasets so I could compare them. Countries seemed like a good starting place, as many of the spreadsheets contain information relating to some, if not all, countries. Another option would be counties of the UK, which turn up quite a lot, and time, generally in years. Time data can also be nicely plotted at Timetric, but though I have an account, I have yet to play about there.

Another reason for using country data was that at the Guardian Open Tech talk, one speaker mentioned that someone had combined data about drug use in various countries with the happiness of various countries and found a small positive correlation. I thought that looking for similar, unexpected correlations might be fun. Rather than thinking of different variables to compare between countries, I figured why not compare every variable with every other variable? There is of course a very good reason why not to, but I wasn’t going to let that stop me.

The reason not to make such blanket comparisons is that some variables will be correlated to one another purely by chance.

Although it’s quite unlikely that any two independent variables will be correlated, if you start making hundreds of random comparisons as I was intending, the chances are that I’d find correlations purely by chance. I think Richard Dawkins mentioned in one of his books that one study found that Israeli fighter pilots were significantly more likely to have daughters than sons. Since no one has provided a good scientific reason for this phenomenon, it is thought that the correlation is simply random and the question is why was anyone comparing these two variables in the first place? So with that in mind, any correlations I find ought to be taken with a large grain of salt.

Since this blog entry is becoming a little large, I’ll split up what I planned to write. The next two entries will therefore be about what I learnt from this project, firstly in terms of what measurements are correlated to what other measurements. Secondly, but more importantly, I’ll write about what I learnt in attempting to make a program to deal with the data from the Guardian Data Store.

Some difficulties mining

The first problem I had in using the Guardian Data Store was in collecting together data from different stories. Perhaps naively or perhaps because I was confused by the release of the Guardian API, which is actually unrelated, I though getting all the data relating to countries would be fairly straight forward. However,  as far as I can tell, there is no easy way of getting hold of multiple datasets; I had to open the spreadsheets one-by-one and copy the information into text files. It’s possible that the Google Docs API could be used to collect the information, but I don’t know how. Once I’d got several text files, each of one Guardian spreadsheet with some data relating to several countries, I put them in a folder and made a Python script to collect all the information together in some useful format.

I came across the second problem, when I noticed I’d ended up with a lot more countries than I though existed. It turns out that the various spreadsheet at the Guardian Data Store do not use consistent name for countries, so the US for example, might be referred to as US, USA, America, United States or United States of America. Another problem was the occasional misspelling, which I supposed I should have expected from the Guardian. On the upside I learnt a bit more about thevarious countries. For example, I didn’t know that as well as there being a Democratic Republic of Congo (also called Congo, DRC), there is also a Republic of Congo (aka Congo, Rep.), right next door. Similarly, I didn’t know that South Korea (aka S. Korea) is officially called the Republic of Korea, while North Korea (aka N. Korea) is officially the Democratic People’s Republic of Korea. It’s quite surprising how many countries have the word democratic in their name, especially when they are conspicuously non-democratic.

I considered writing a Python program with a Tkinter interface to help combine the data from countries, but I haven’t yet managed to do that.

Instead, I found it quicker to output the combined data as an XML file, showing all the countries, each containing their respective measurements. Then I could spot where countries had been duplicated and copy the measurements across. I can’t be sure that I’ve spotted all the duplications, but I’ve managed to reduce the number of countries names from over 320 to about 280. I don’t know why the Guardian can’t use a consistent nomenclature, but I suppose the spreadsheets are made by different people and collected from different sources. Still, a single list of countries with the various variable plugged into it would have made my life a lot easier and is surely possible – it would also save journalists from having to write the names of countries out.

[Update: it seems that some of the data now includes three letter codes for countries.]

I have yet to make a good way of dealing with new spreadsheets as they come in, but I suppose it would be good to have a program that highlights when an apparently new country has been discovered, and allows you to pick a country from the current list to combine it with if necessary. Then I can make an updated XML file. I’ll make the XML file that I have at the moment available once I work out how to and have made sure what I’ve written is actually proper XML. At the moment is seems to be broken by the country Curaçao – I don’t think it can deal with odd letters. One benefit I have got from this project is that I have learnt a bit about XML and have been working out how to parse it (I know there are parses out there already, but I’d like to make my own).

I actually have two XML files: in one the countries are the highest level data and contain a list of measurements with their values, in the other, the measurements are the highest level data and contain a list of countries. The latter is more similar to the data is combined and is more useful to my program, but the former allows me to combine countries more easily. I often find that I don’t know which way to store this kind of data, and wonder whether it isn’t most efficient to store it both ways for quick comparisons. Maybe there is some way around this problem I’m unaware of.

The final difficulty (for now) that I had on this project was with R. R is a very powerful statistical program that work through the command line. As such it can be called by other programs quite easily. Part of the reason for my embarking on this project was to brush up on my R, and to see if I could get a Python program to call R. The idea was to use R to calculate correlations, covariances, clustering and whatever else took my fancy, and also to draw scatter plots of the data, so I wouldn’t have to draw the graphs myself.

I thought I was in luck when I found RPyWin32 for some odd Windows reason.

As a result I could use PythonWin, which is quite a nice IDE (integrated  development environment), which actually keeps the graphs I draw using R, unlike running Python from the command line. All in all, it was a pain to have to download these bits and pieces, because it means if anyone else wants to use my program they’ll have to download them too. Also, Python is still not in my registry, despite my efforts to fix it, and every time I import R using Python, it says it can’t find R in the registry and has to find it some other way. Anyway, rant over. I may end up not using R though and instead calculate covariances and draw graphs myself.

My progress with the Guardian Data Analyser app has now halted due to a problem in how to store the data and relationship between data points. I have hit a similar problem with my Chinese Reader App with how I should store words and their relationships. The Guardian app uses XML, which might be OK, but I think storing the data in a database might be more useful. Maybe, with the release of government data in a SPARQL database will encourage the Guardian to do the same. Either way, I think I need to learn how to use SPARQL, or maybe MySQL soon.

Some basic clustering

In anticipation of having some data to analyse, and due to reading An Introduction to Bioinformatic Algorithms and Information Theory, Inference, and Learning Algorithms, I decided to create a program that can cluster data. The program uses either the hard or soft k-means method, because they are mentioned in both books, though the Information Theory book has more complex algorithms that I might try, if I can understand them.

To help me understand how the clustering algorithm works I’ve written a program (in Python and Tkinter, yet again) that creates a number of random clusters of two-dimensional (to make it easy to display) data, and shows how they algorithm clusters them. Each of the underlying clusters is given a random (x, y) coordinate, a random distance, and a user-defined number of associated data points. Each data point is then given a random (x, y) coordinate that is a random distance (uniformly distributed between 0 and the cluster’s distance value) from its clusters (x, y) coordinate. In a later version I would like to test clusters that are not uniformly distributed and are not circular. The job of the clustering algorithm is to determine which data point is associated with which cluster

The app creates underlying clusters of points.

The hard k-means clustering algorithm works by predicting the (x, y) coordinates of the true clusters and assigning each data point to the nearest predicted cluster. The predicted clusters are first given random (x, y) coordinates, then each data point is assigned to the closest one, and the (x, y) coordinates of each predicted cluster is updated to be the mean of each of its data points’ (x, y) coordinates. If a cluster has no associated data points (that is, each data point is closer to another predicted cluster), then I give it another random (x, y) coordinate. The process iterates until no changes in the clusters occurs. The soft k-means clustering algorithm works in a similar way, except that instead of assigning each data point to the nearest cluster, each cluster is given a ‘responsibility’ for each data point. The responsibility is a value between 0 and 1, and is a function of the predicted cluster’s distance from the data point.

The app predicts where the clusters are.

The algorithms seem to work pretty well, however, as you can see above, they are far from perfect. The program draws a line between each data point and its associated cluster. The line is coloured depending on which of the underlying clusters the data actually comes from in. In the example below, the black cluster is predicted perfectly, whereas the blue cluster is split in half. I think the algorithms fail particularly when the data is from long thin clusters. They can also sometimes end up stuck with predicting two clusters where there is one if the data is very spread out. Repeating the process with randomly initialised cluster generally allows this phenomenon to be overcome (although repeatedly trying to cluster the data shown below, always seems to end up with a cluster joining the yellow data to some of the blue data).

 

Choropleth maps of Africa

A few weeks ago I was delighted to see a guide on to how create a choropleth map. The article, not only uses Python, but was perfectly timed with my exploration into SVG files and how they work. The article was also timed with my re-reading of my thesis in preparation of my viva. As I was trying to find the latest statistics concerning rate of human African trypanosomiasis (sleeping sickness), I thought I’d use data to colour a map of Africa.

New cases of rhodesiense in 2006

First I had to get hold of the data. The WHO website has the numbers of new infections reported for two strains of trypanosomes, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense. However, the data is in tables in PDFs which makes it a pain to extract. One PDF has the data from 1990 to 2004, another has data from 1997 to 2006 and there are occasional contradictions where they overlap. There is older data too, but I haven’t yet copied that into a usable format.

Once I had some data in a text file, I had to get a blank SVG map of Africa. The map I used is from Wikimedia. I also experimented with a more detailed map that had all the district within each country, but that was too detailed (though the file was organised in an easier-to-use way). I had to make quite a few changes to the map before I could colour it. The first problem was that each country was given a two letter code, which I needed to convert into a full name. Two letter codes avoid the problem of having multiple names for countries (e.g. do you use ‘the Congo’ or ‘the Republic of the Congo’? Neither of which is the Democratic Republic of the Congo’), however, the data from the WHO (and nearly everywhere else) uses the full country names. Besides which, it’s much easier to use the image when the countries have names. Converting the codes to names improved my geography no end. For example, I hadn’t even heard of Benin or Lesotho before, let alone been able to place them on a map. (Lesotho is particularly interesting as it is an enclave, completely surrounded by South Africa).

I had to make further changes due to using Beautiful Soup, as the guide instructed, to analyse the XML. In retrospect, it may have been simpler to write my own simple XML parser or to simply colour the countries using CSS. However, I’m quite glad I tried, as I might be able to use Beautiful Soup for parsing some other XML. The problem with Beautiful Soup is that it can’t handle self-closing tags. It also appears not to handle tags within tags of the same name. This might be sensible for XML, but the image contains groups of groups, which I had to separate. One reason for having groups of groups were the islands which make up a single country, which I will probably remove as they can’t be seen anyway. Another reason for having nested groups is that the whole continent formed a group, which was transformed, so as to be in the centre of the image. I’m not sure why the image was constructed this way and not just draw in the centre in the first place; I will try and alter all the coordinates so that the transformation is not required. My final annoyance with Beautiful Soup is that it insists of rewrite viewBox as viewbox, which Chrome (which may be to blame here) is unable to understand. As a result, every time I create a map, I have to edit one letter in a text editor before I can view it.

New cases of gambiense in 2006

I’m pretty pleased with how the maps have turned out (and I like the colour scheme), though there are still some points that need improving. The main problem concerns how the data is split into groups. In the first incarnation of my program, I chose the range of values for each group, but I after switching between data sets repeatedly I wanted a way to determine the range automatically. As you can see, this leads to a slightly odd (though arguably more valid) split of one group every 49 cases for the first group, and one group every 1604.6 in the second. Rounding to an integer would be a start, but I might see if I can also round to a ’sensible’ number, such as 50 for the first map and maybe 2000 for the second. The second graph highlights another problem, which is that all the countries except the Democratic Republic of the Congo (DRC) have the same colour. This is because the DRC had over 7 times as many cases of T. b. gambiense infection as the country with the next highest number of cases. To solve this, I could either use a log scale or the final category could be say ‘2000+’, though the latter might underplay the serious of the problem in the DRC. It might also make sense to have a single category for zero cases, which in this case I think, should be coloured grey, as the WHO only provided data for countries that had had at least one case in the years they were recording.

Another option is to use a continuous scale to colour countries (or rather a scale with so many categories that it appears continuous). This is how a similar map is coloured on the Wikipedia page on trypanosomiasis. The map (which, it turns out, was made by someone I know), shows deaths per 100,000, which is perhaps a better metric (and one I now have the data for). Converting the numbers into a percentage of population is also a better way of illustrating the problem.

Now I have a reasonable way to colour maps, I can use any data about countries. As it happens, I have been collecting data about countries from the Guardian Data Store and other open sources. Below are a couple of examples of maps made using some of those data (Case of AIDS and Hunger). I need to sort out the scales, so they're more sensible.

Graphs of AIDS and undernoishment in Africa

Interactive SVG map

I've been learning how to animate SVGs (I've written a couple of tutorials of what I have since learnt here and here) and have updated my map-drawing program to create interactive maps.

Below is a map I made to show life expectancy (at birth, as of 2007) in Africa. You can mouse-over the map to see the name of a country and the life expectancy there. I'm impressed with how powerful SVGs can be, but slightly disappointed that transparency doesn't work in Chrome [update: it seems to work now - Horray!]. The map can look a bit grainy, but looks a lot smoother if you zoom in a bit.

Life expectancy (2007)

Note, Internet Explorer before IE9 can't render SVG and Firefox doesn't support mouse-over animation. It definitely works with Chrome and apparently works in Safari (thanks Al).

The mouse-over effect works by adding a set element to each country's path with changes the opacity of the path as the mouse moves over it:

<path d="lots of coordinates...">
  <set attributeName="opacity" from="1" to="0.5" begin="mouseover" end="mouseout"/>
</path>

The only country for which this appears not to work is Lesotho, the enclave in South Africa. The reason that it doesn't appear to lighten is that when you can see through it, you see South Africa below, which is the same colour. I will see if I can sort this out by cutting a Lesotho-shaped hole in South Africa.

To get the names of countries to appear, I created the a text element for each country with its name at the same position and with visibility="hidden". Then for each text element I added a set element with the begin and end events pointing to that country. This works because the path of each country or the group of paths of a country has a class equal to its name.

For example, the Algeria text element contains:

<set attributeName="visibility" from="hidden" to="visible" begin="Algeria.mouseover" end="Algeria.mouseout" />

[A more efficient way to achieve this effect is to use ECMAScript]

T. brucei rhodesiense (1990-2006)

The main reason for experimenting with this interactivity is to see whether I can display several years worth of data on the same map. Below is a map that can display data (new cases of T. brucei rhodesiense) from nine different years when you mouse-over the relevant year. This allows one to track the spread and decline of the disease.

I would like to prevent the colours from disappearing when the mouse leaves a year, but couldn't make the mouse-over work properly when I removed the end condition. I'd also like to stop the mouse icon from changing when it moves over the text. Finally, like to display the number of cases of T.b.rhodesiense for a country when it is selected, but I think this would be very complex to implement.

Mouseover effects in SVGs

In this tutorial I'll describe five different methods to achieve a mouseover effect in an SVG. I'll start with the simplest and most limited approach (CSS), and work up to the most complex, but most flexible approach (Javascript/ECMAScript, which is described in more detail here). To view the full code for any of examples in this post, right click on an image and chose View Frame Source or something similar, depending on your browser.

For further information see:

Example SVG

SVGs seem to be an increasingly popular way of adding high quality, interactive images to the web. Even IE9 now supports SVGs to a degree. Older IE users will have to use a Chrome Frame or something similar. Note, that I've tested only these effects in Chrome and Firefox. In order to demonstrate mouseover effects, we need a simple SVG. The code below draws two squares: one blue, one green.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
width="300" height="80">

  <text x="10" y="35">Two passive squares:</text>

  <rect id="rect1"  x="160" y="10" 
   width="60" height="60" fill="blue"/>

  <rect id="rect2" x="230" y="10"
   width="60" height="60" fill="green"/>

</svg>

CSS

The simplest way to get a mouseover effect is to use the hover effect with CSS styling. Note that the style element can be written within the SVG itself and works just as with HTML. For example, add the following code into the <svg> element:

  <style>
    rect:hover
    {
      opacity: 0.5;
    }
  </style>

CSS within SVG works just as with HTML, so we can get more specific effects by using different selectors. For example, using .some-class-name:hover allows us to selectively apply the mouseover effect to only those elements with the attribute class="some-class-name".

If you can get away with just using CSS, I'd recommend it, but there are some limitations:

  • We can only effect the properties of the element we have moused-over (although we can partially get around this using groups and more advanced CSS selectors).
  • We can only effect the style of an element, not its other attributes such as size or position.
  • We can only trigger events with a mouse hover, and not, for example, on a mouse click.

Onmouseover events

A more targeted approach is to add a function directly into the <rect> element. The function onmouseover is called when the mouse is moved over an element, and we can set this to changes the element's opacity attribute (or any other attribute). To get the full hover effect we also need to change the opacity back to normal when the mouse leaves the <rect> element by setting the onmouseout function.

<rect id="rect1" x="160" y="10"
width="60" height="60"  fill="blue"
onmouseover="evt.target.setAttribute('opacity', '0.5');"
onmouseout="evt.target.setAttribute('opacity','1)');"/>

This method requires more code, especially if you want to add effects to multiple elements, but gives us a bit more flexibility. For example, if we could chose not to return the opacity to 1 when we move the mouse out of the box, or we could trigger effects with onmouseup or onmousedown events. We also have more freedom in which attributes we change. For example, if we write:

<rect id="rect1" x="160" y="10"
width="60" height="60"  fill="blue"
onmousedown="evt.target.setAttribute('y', '5');"
  onmouseup="evt.target.setAttribute('y', '10');"/>

Then the box will 'jump' when we click on it. Finally, we are also not limited to just effecting the element we've moused over. If we change evt.target to evt.target.parentNode.getElementById('rect2'), we select the element with id='rect2', i.e the green rectangle. This approach however, is a little indirect, and I would suggest using ECMAScript instead.

Set attributeName

We can also add a set element to a <rect> element. This allows us to set an attribute in response to an event, even if that event does not involve the element we are changing. Here I used the rect2.mouseover event to trigger an effect in rect1.

Note that this effect does not work in Firefox Firefox 6 and earlier (I think), which is perhaps the biggest drawback of this method.

  <rect id="rect1" x="160" y="10"
   width="60" height="60" fill="blue">
    <set attributeName="fill-opacity" to="0.5"
     begin="rect2.mouseover" end="rect2.mouseout"/>
  </rect>

The set element is pretty versatile and can be used for various animation effects. For example, with begin="4s", the effect will begin 4 seconds after loading.

ECMAScript

The final method I'll describe is the most involved, but by far the most flexible (for example, we can create a tooltip). It involves using ECMAScript (like JavaScript) to write our own functions. The functions are created within a <script> element. Below is one function to make an element semi-transparent and another to make an element opaque. 

  <script type="text/ecmascript">
    <![CDATA[
      function MakeTransparent(evt) {
        evt.target.setAttributeNS(null,"opacity","0.5");
      }

      function MakeOpaque(evt) {
        evt.target.setAttributeNS(null,"opacity","1");
      }
    ]]>
  </script>

Now we can set the onmouseover and onmouseout functions to equal these custom functions.

  <rect id="rect1" x="160" y="10"
  width="60" height="60" fill="blue"
  onmouseover="MakeTransparent(evt)"
  onmouseout="MakeOpaque(evt)"/>

  <rect id="rect2" x="230" y="10"
  width="60" height="60" fill="green"
  onmouseover="MakeTransparent(evt)"
  onmouseout="MakeOpaque(evt)"/>

Clearly, this method requires a lot more writing for the same effect, but since we can write our own functions, it has the potential to generate much more complex effects. I've written a more detailed description of what can be done with ECMAScript here.

JavaScript

Instead of using ECMAScript the SVG itself, it is also possible to use JavaScript in the HTML that contains the SVG, which I explain here. There are several JavaScript libraries that make this even easier. For example, AmpleSDK or Polymaps, which is specific for SVG maps.

Inkscape images

Since a couple of people have asked, you can use these effect with Inkscape SVGs. Just make sure you target the right attributes (such as 'transform' if you want to move or scale a shape). See the attached star.svg for an example.

AttachmentSize
star.svg2.36 KB

SVG mouseover tricks

Adding mouseover effects to an SVG is a simple way to make it interactive. Previously, I decribed several techniques to create basic mouseover effects, generally turning one or two boxes partially opaque. I often find just using CSS method works fine, but sometimes more complex effects are required. Below are some of the tricks I've discovered for achieving more advanced effects.

Grouping

I often find that I want to trigger an effect on one element when the mouse moves over another. For example, I might want to highlight a line on a graph the mouse moves over its label. As I mentioned in the previous post, this could be achieved using <set> or ECMAScript, but a simpler and more reliable method is to simply group the elements and use CSS. The idea of grouping elements is prehaps too simple to be considered a trick, but I've included it here for completeness.

For example to get a box to change opacity when the mouse is moved over either it or a label we can group them like so:

<g class="hover_group">
  <text x="30" y="20">Blue box</text>
  <x="30" y="30" width="60" height="60" fill="blue"/>
</g>

And then use CSS:

<style> 
  .hover_group:hover
  {
    opacity: 0.5;
  }
</style>

CSS selectors

In the previous example, both the label and the box change opacity when the mouseover event is triggered, which is sometimes what you want. If you don't, you can prevent the text from altering, by changing the CSS to:

<style> 
  .hover_group:hover rect
  {
    opacity: 0.5;
  }
</style>

Now the text elements don't change opacity but still trigger changing the boxes' opacity.

Changing the cursor

In the previous example, when the mouse moves over the text elements it changes to a text cursor, which looks a bit strange on buttons or labels. You can change the cursor to the default arrow by adding the attribute cursor="default" to an element. Or you can use CSS again to change, for example, the behaviour of all <text> element:

text
{
  cursor: default;
}

For a list of all the different cursors see my post about SVG buttons (for which these tricks are quite useful).

Preventing mouseover events

If you move the text in the previous example down so its over the box, you have the basis for a button. However, rather than create a group, an extra CSS style, and change the cursor, it's simpler to make the text elements invisible to mouseover effects (uninteractable?). This is also very useful if you have overlapping element and want to ignore the ones on top.

In the example below, there is an ugly effect as the text on top of the boxes blocks the mouseover effect.

To get around this, we can use the following CSS to make the cursor ignore all text elements:

text
{
    pointer-events: none;
}

Continuous events on mouse hold

This isn't really a mouseover effect, rather a mousedown effect, but it's related, so I've included it here. The problem is how to continually call a function when the mouse is held down (as in this example). The answer is to have two addition functions in addition to the function you want to call (here named myFunction):

function beginFunction()
{
  myFunction();
  myTimeout = setInterval("myFunction()", 50);
}
    
function endFunction()
{
  if (typeof(myTimeout) != "undefined")
  {
    clearTimeout(myTimeout);
  }
}

Then add to the element you want to animate:

<tag onmousedown="beginFunction()"
     onmouseup="endFunction()"
     onmouseout="endFunction()" />

Now when the mouse is held down on the element, beginFunction() is called, which is turn calls myFunction(). It then uses setInterval() to repeatedly call myFunction every 50 ms. When the mouse either leaves the element or is released, then endFunction stops the repeating calls.

AttachmentSize
group_hover_css.svg686 bytes
group_hover_css2.svg691 bytes
group_hover_cursor.svg738 bytes
group_hover_hide_event.svg571 bytes
group_hover_hide_event2.svg623 bytes
growing_circle.svg1.21 KB

Using Javascript to control an SVG

Like HTML, SVGs are represented using the Document Object Model (DOM) and so can be manipulated with Javascript relatively easily.

First create your SVG. Give the element you want to control an id so it can be easily selected.

<svg version="1.1"
     xmlns="http://www.w3.org/2000/svg"
     width="400" height="300">
  <style>
    circle {
      fill-opacity: 0.5;
      stroke-width: 4;
      fill: #3080d0;
      stroke: #3080d0;
    }
  </style>
  <circle id="my-circle" cx="100" cy="100" r="50" />
</svg>

Then add the SVG into your HTML document with the object tag and give that an ID.

<object id="circle-svg" width="400" height="300" type="image/svg+xml" data="moving_circle.svg"></object>

Javascript

You can then select the SVG element by its ID:

var svg = document.getElementById("circle-svg"); 

Then select the SVG document:

var svgDoc = svg.contentDocument;

Then select elements within the document:

var circle = svgDoc.getElementById("my-circle");

You can manipulate an element's attributes with setAttributeNS():

circle.setAttributeNS(null, "cx" 200);

Example

You can see an example at http://www.petercollingridge.appspot.com/svg-and-js

It uses the HTML5 slider element (which only works properly in Chrome or Safari) to control the position of the circle. The files used can be downloaded below (the HTML file is a text file - you just need to change the extension). Note that open the Javascript may not run if you open the HTML on your computer, but will work when run on a server.

AttachmentSize
moving_circle.svg361 bytes
svg-interaction.js.txt268 bytes
svg-and-js.txt662 bytes

Introduction to SVG scripting: an interactive map

This is a brief introduction to how you can create make an SVG interactive using ECMAScript. For an introduction to how to general SVGs please take a look at my SVG tutorial. As an example, I'll create an interactive map, which seems to be a popular use for SVGs. I've written an overview of alternative methods for adding interactivity to an SVG here. There are basically only four functions you need to know:

  • getElementById("X") - gets the element with an id of X
  • element.getAttributeNS(null, "X") - gets the value of attribute X
  • element.setAttributeNS(null, "X", "y") - sets the value of attribute X to y
  • element.firstChild.data - refers to the text in a text element

Once you know how to use these (and how to trigger events), you can create all sorts of impressive effects with minimal effort.

Step 1. Get some country-based data

The first step is to get some data. Presumably, if you want to create a map, it's because you have some data you display. If you're looking for sources of digital data, you could try the world bank, the Guardian Datastore, data.gov.uk (for UK-based data) or data.gov (for US-based data). For this tutorial, I'm going to use some data from the WHO, but it's actually quite tricky to extract data from their website.

Step 2. Get a map

Now you need an SVG map. You can either draw your own in a program like Inkscape or Adobe Illustrator (or you could theoretically write the SVG from scratch), or you can download an open source SVG map from Wikimedia. If you do get a map from Wikimedia or some other source, you'll may have to clean it up (moving elements out of groups) to get all the interactions to work correctly. A key feature your map needs is for the country elements (almost certainly paths) to have an id attribute that gives their name.

For this tutorial I'm using a map of Africa that I got from Wikimedia and spent way too long cleaning up. I've also add a CSS hover effect (described here). You can download the map at the bottom of this page under Attachments.

Step 3. Initialise the SVG

In my simple ECMAScript example, I only changed the attributes of elements passed by evt.target. However, to have more control over the SVG DOM, we need to be able to refer to our SVG. We can do that by creating the following function.

<script type="text/ecmascript">
<![CDATA[
  function init(evt) {
    if ( window.svgDocument == null ) {
      svgDoc = evt.target.ownerDocument;
    }
  }
]]></script>

We can now use the svgDocument object. Note that all of our code must be written in the <script> element and CDATA[ ] label.

Now we need to call our function, which we do with the onload event in the <svg> element: the svg tag should look something like this:

<svg width="400" height="400" version="1.1"
xmlns="http://www.w3.org/2000/svg" onload="init(evt)" >

Step 4. Display country names

The first script effect I'll demonstrate is how to change the text in a <text>. This can be used to achieve a variety of effects, such as displaying the name of an object when the mouse is held over it (see my tooltip tutorial). For this tutorial, we'll just display the country name at the bottom of the image. In my previous interactive SVG map I achieved this effect using the set command, but that's quite inefficient as it requires a separate text element for every country. Using ECMAScript is a more efficient method.

First, we add an empty text element to the bottom of the image:

 <text class="label" id="country_name" x="10" y="390"> </text>

I've given it a class, so it's easy to style, and an id, so I can target it within a function. I've also made the <text> element contain a single space; if you don't have anything here, then the text's value will be null, which will cause problems in the next step.

The next step is to create a function (within the <script> tags) that changes the text value:

function displayName(name) {
  svgDoc.getElementById('country_name').firstChild.data = name;
}

This function selects the element with the id 'country_name', and sets its firstChild.data, i.e. the value between the tags, to equal to a passed name. To call the function when a country is moused over, you need to add an onmouseover event to each path or group tag making up a country. The event should call the function and pass the relevant name:

onmouseover="displayName('Whatever country name')"

It's a bit of a pain to manually add this code to lots of countries so using Python to edit the XML is a good idea if you know how to. The map should now look something like this:

5. Colour a country

Assuming each path or group of paths representing a country has a sensible name (and this is where it pays to have a good starting map), it's easy to colour a country. For example, we can colour Libya green:

country_id = 'libya'
colour = '#004400'
country = svgDoc.getElementById(country_id);
country.setAttributeNS(null, 'style', 'fill:'+ colour);

However, if you're making a chloropleth map, you are likely to group countries into a few different classes and associate each class with a colour. For this, it makes sense to use classes and CSS. This will also make it much easier to change colour scheme later if we so wish.

So first we define a few colours within the style element:

.colour0 {fill: #b9b9b9;}
.colour1 {fill: #ffa4a9;}
.colour2 {fill: #cc6674;}
.colour3 {fill: #993341;}
.colour4 {fill: #66000e;}

Then we can define a function that finds the class of a given country and adds an additional class of "colourX", where X is a given number.

function colourCountry(name, colour) {
   var country = svgDocument.getElementById(name);
   var oldClass = country.getAttributeNS(null, 'class');
   var newClass = oldClass + ' colour' + colour;
   country.setAttributeNS(null, 'class', newClass);
}

For example, we could colour a Algeria with our third colour like so:

colourCountry('algeria', 2)

6. Colour multiple countries

We can use the function we just defined to colour many countries at once with a loop. The following function colours an array of countries with a given class number: 

function colourCountries(data, colour){
    for (var country=0; country<data.length; country++){
        colourCountry(data[country], colour);
    }
}

We use this function like so:

var data1 = ['ghana', 'togo']
var data2 = ['burkina-faso', 'cameroon', 'chad'];
colourCountries(data1, 1);
colourCountries(data2, 2);

It should be pretty clear that we can generalise this approach with another loop, although it does require creating an array of arrays. Although this is perhaps not the intuitve data struture, it is easy to program. We can arrange our data in an array, such that each value in the array is an array of countries that share a colour. For example:

var data1 = [['ghana', 'togo'],
             ['burkina-faso', 'cameroon', 'chad'],
             ['congo', 'cote-ivoire']];

In this example, Ghana and Togo are given colour 1, Burkina Faso, Cameroon and Chad are given colour 2 and Congo and Cote d'Ivoire are given colour 3. We can transverse this array of arrays and add the colours with an updated colourCountries() function:

function colourCountries(data) {
  for (var colour=0; colour<data.length; colour++){    
    for (var country=0; country<data[colour].length; country++){
      colourCountry(data[colour][country], colour+1);
    }
  }
}

AttachmentSize
Blank Africa Map.svg166.46 KB
Blank map display names.svg168.39 KB
colour_with_class.svg167.05 KB

Open data from the World Bank

Making data freely available seems to be very fashionable at the moment, and that can only be a good thing. Just at the start of this month, Ordnance Survey released some of their data, allowing people to access and make use of huge amounts of geographical data. The postcode data was made available (to a degree and after a struggle) this year. Last year, Tim Berners-Lee got the UK government to make its data (collected at the tax payers' expense) freely available at data.gov.uk. And today, the World Bank has opened up its development data at data.worldbank.org.

I'm looking forward to seeing what people can do with all this information, and I have been re-inspired to attempt to do something useful with all this information and countries (I was previously going to use data from the Guardian Data Store, but this should be easier). I have got myself an API key and have actually been learning how to use it. Once I know the general way in which APIs work (which is something I've been meaning to learn for a long time), I should have a huge wealth of information at my fingertips.

A bit of code

I have finally learnt how to use Python to interrogate and search websites, which I can see is an incredibly powerful tool. The key point is to use the urllib module, which is part of the standard Python installation. It's then very simple to open a URL, which can be treated like a file. So you need to do then is read the file/URL and parse it (in this case, the data is XML format, so you use an XML parser, much like editing SVGs).

The following code gets all the countries for which the World Bank has data:

import urllib

root = "http://open.worldbank.org/"
queryURL = root + "countries?api_key=" + my_api_key

sock = urllib.urlopen(queryURL)
XML = sock.read()
sock.close()