Saturday, 28th January 2012
I've started work on a program I've been meaning to make for a while: an SVG optimiser. I've often found myself spending a lot of time tidying, simplifying and compressing SVGs created by Inkscape or Illustrator. Sometimes, changes are merely aesthetic, e.g. reducing numbers from an unnecessary six decimal places to one, or removing unused attributes. These changes make it easier to read the file, and can reduce its size noticeably. Other changes are more practical, such as removing transforms which otherwise make it difficult to see where paths and shapes are actually placed. This is particularly important if you want to add animations or interactive elements.
[A very rough test app is available at http://petercollingridge.appspot.com/svg-optimiser. I offer no guarentees as to whether it will work with any particular file.]
The latest version of the program can be found on here (requires Python 2.7). At the time of writing, the functionality is limited and the program quite buggy. Below a brief description of what it can and can't (yet) do.
One of the annoying aspect of Inkscape and Illustrator is that they often specific coordinates to six decimal places which is completely unnecessary unless you're planning to view your SVG at a million times magnification. My program can now fix numbers to n decimal places with
svg.setDemicalPlaces(n). The part I'm most pleased about is that it also removes trailing decimal zeros. So if you want two decimal places, it won't convert "12" to "12.00" and it will convert "12.02" to "12". Unfortunately, at present, it doesn't work for the <path> 'd' attribute, which is probably the most frequently used, but also the hardest to parse. [Update 29/01/12 - it does now, although it strips out all commas too, which works fine, but isn't ideal.]
Inkscape also gives all shapes an id, which makes sense, but I prefer to remove it as it makes the file slightly harder to read and because I like minimal files. Now I can call
svg.removeAttribute('id') and away they go. Inkscape also generates lots of attributes beginning with sodipodi, which can be removed if you don't want to open the file in Inkscape again (even if you do, I don't think it makes much difference). To remove these efficiently, I intend to create a function that removes an entire namespace. [Update 29/01/12 - I now have this function, which can strip out all the sodipodi tags that Inkscape adds.]
Whenever you move a shape in Inkscape, rather than change its actual coordinates, it adds a transform or changes an exisiting transform. This makes a lot of sense as it's a lot simplier, but it can working with an SVG manually a pain. Eventually I hope to be able to remove all the transform elements, but for now I can only remove translations from all shapes other than paths.
Move styles to CSS
Another inefficiency of Inkscape- and Illustrator-derived SVGs is that they tend to assign a lengthy style attribute to each element. Often the same style is applied to all or many elements. I'm hoping to be able to remove these and replace them with a class attribute which can be styled using CSS. So far, I've not attempted this, but it should be relatively easy. [Update 29/01/12 - I've made a start on this and it works relatively well, but will cause problems on some SVGs.]
Below is an example of an SVG before and after optimisation. Hopefully you can't see any difference in the images. In the first, nearly all the shapes have various a translation attribute, and so of the values have many decimal places. In the second, the translation have been applied to the coordinates, the numbers have been rounded to one decimal place and all the id attributes have been removed. As a result, the second file is 68% the size of the first - not a massive difference, but the original file was hand scripted so is actually relatively concise already.
[Update 29/01/12 - both these issues are fixed since I switched to using lxml.] The biggest issue at the moment is the weird way that the XML parser it uses (which is part of the built-in ElementTree module) treats namespaces. As a result, it strips out the xmlns:xlink, which doesn't matter in the above example, but causes problems with Inkscape files. I'm going to have to spend some time working through the parser to see if I can work out where the problem is. It also strips out comments, which is potentially annoying.