Code which I've written

Memory optimised decompressor for Pebble Classic

TLDR: DEFLATE decompressor in 3K of RAM

For a Pebble app I've been writing, I need to send images from the phone to the watch and cache them in persistent storage on the watch. Since the persistent storage is very limited (and the Bluetooth connection is relatively slow) I need these to be as small as possible, and so my original plan was to use the PNG format and gbitmap_create_from_png_data(). However, I discovered that this function is not supported on the earlier firmware used by the Pebble Classic. Since PNGs are essentially DEFLATE compressed bitmaps, my next approach was to manually compress the bitmap data. This meant that I needed a decompressor implementation ("inflater") on the watch.

The constraint

The major constraint for Pebble watchapps is memory. On Pebble Classic apps have 24K of RAM available for the compiled code (.text), global and static variables (.data and .bss) and heap (malloc()). There is an additional 2K for the stack (local variables). The decompressor implementation needed to have both small code size and variable usage. I discovered tinf which seemed to fit the bill, and tried to get it working.

Initially, trying to decompress something simply crashed the app. It took some debug prints to determine that code in tinf_uncompress() wasn't even being executed, and I realised that it was exceeding the 2K stack limit. I changed the TINF_DATA struct to be allocated on the heap to get past this. At this stage it was using 1.2K of .text, 1.4K of .bss, 1K of stack, and 1.2K of heap (total 4.8K). I set about optimising the implementation for memory usage.

Huffman trees

Huffman coding is a method to represent frequently used symbols with fewer bits. It uses a tree (otherwise referred to as a dictionary) to convert symbols to bits and vice versa. DEFLATE can use Huffman coding in two modes: dynamic and fixed. In dynamic mode, the compressor constructs an optimal tree based on the data being compressed. This results in the smallest representation of the actual input data; however, it has to include the computed tree in the output in order for a decompressor to know how to decode the data. In some cases the space used to serialise the tree negates the improvement in the input representation. In this case the compressor can used fixed mode, where it uses a static tree defined by the DEFLATE spec. Since the decompressor knows what this static tree is, it doesn't need to be serialised in the output.

The original tinf implementation builds this fixed tree in tinf_init() and caches it in global variables. Whenever it encounters a block using the fixed tree it has the tree immediately available. This makes sense when you have memory to spare, but in this case we can make another tradeoff. Instead we can store the fixed tree in the same space used for the dynamic tree, and rebuild it every time it is needed. This saves 1.2K of .bss at the expense of some additional CPU usage.

The dynamic trees are themselves serialised using Huffman encoding (yo dawg). tinf_decode_trees() needs to first build the code tree used to deserialise the dynamic tree, which the original implementation loads into a local variable on the stack. There is an intermediate step between the code tree and dynamic tree however (the bit length array), and so we can borrow the space for the dynamic instead of using a new local variable. This saves 0.6K of stack.

The result

With the stack saving I was able to move the heap allocation back to the stack. (Since the stack memory can't be used for anything else it's kind of free because it allows the non-stack memory to be used for something else.) The end result is 1.2K of .text, 0.2K of .bss and 1.6K of stack (total 3.0K), with only 1.4K counting against the 24K limit. That stack usage is pretty tight though (trying to use app_log() inside tinf causes a crash) and is going to depend on the caller using limited stack. My modified implementation will allocate 1.2K on the heap by default, unless you define TINF_NO_MALLOC. Using zlib or gzip adds 0.4K of .text. You can find the code on bitbucket.

Building Raspbian images for Raspberry Pi

I recently bought a Raspberry Pi, which is a credit card sized computer with an ARM processor. I'm using it as my TV frontend, running Raspbian and XBMC. I'm building my own packages for XBMC since it requires the latest development version.

I initially installed my Pi with the foundation image, but found that it included a lot of packages which I didn't need. Since I have a slight obsession about doing things as efficiently as possible, I decided to build my own image with XBMC from scratch.

I implemented a script in Bash, which does this. It uses debootstrap to install a minimal Raspbian system in a chroot. It then installs XBMC and a couple extra packages, and does some necessary configuration. Finally it creates an image file with the necessary partitions, creates the filesystems, and copies the installation into the image file. The resultant image fits onto a 1GiB SD card. You can download a pre-built image from this page.

The script can be modified to build images with different packages, or even a very minimal image which fits onto a 512MiB SD card.

Working with GPX files in Python

Extracting specific track segments from a track

I have an i-Blue 747A+ GPS logger which I use to track my runs (amongst other things). Afterwards I use BT747 to retrieve the data from the device and create a GPX file of the run, which I then upload to Endomondo which gives me nice graphs and statistics.

I need to modify the GPX file slightly before I can do so however: I use the button on the device to mark the beginning and end of the run, which appear as waypoints in the GPX file. BT747 creates separate track segments (within a single track) between each waypoint, but Endomondo ignores these. I therefore need to extract the single track segment covering the actual run and create a new GPX file with just that segment. I therefore wrote a script in Python,, to do this. It uses the gpxdata library to parse the input file, locates any track segment which match a set of time, distance, displacement and speed criteria1, and outputs a new GPX file with just those.

% mgorven-20121015_0109.gpx > run-20121014.gpx
Reading mgorven-20121015_0109.gpx
<TrackSegment (23 points)> covers 21m over 0:00:22 at 1.0m/s average speed with 4m displacement
<TrackSegment (904 points)> covers 3018m over 0:15:03 at 3.3m/s average speed with 8m displacement
Adding <TrackSegment (904 points)>
<TrackSegment (4 points)> covers 3m over 0:00:03 at 1.3m/s average speed with 3m displacement

Integrating heart rate data

I then recently bought an Oregon Scientific WM100 heart rate logger. It listens to the broadcasts from a heart rate strap2 and records the measurements every 2 seconds. I retrieve the data using the wm100 driver for Linux which writes a CSV file like this:


In order to get this data into Endomondo, I needed to combine the GPS trace with the HRM data into a single file format which Endomondo accepts. I initially started implementing a library for the TCX format3, but then discovered that there is a GPX extension for including heart rata data which Endomondo accepts. So I wrote a script in Python,, which reads the input GPX and CSV files, merges the heart rate measurements into the GPX records, and outputs a new GPX file.

% 2012-10-14T18:08:27.csv < mgorven-20121015_0109.gpx > run-20121014.gpx

The entries look like this:

<trkpt lat="37.392051" lon="-122.090240">

  1. I actually initially wrote this to find tracklogs of runs amongst all my tracklogs. 

  2. I use a strap from an entry level Nike triax C3 heart rate monitor watch. 

  3. Which is quite exhaustive... 

Migrating SMSes from Nokia to Android

My wife recently got a Samsung Exhibit II 4G Android phone to replace her aging Nokia E63. Migrating her contacts was accomplished fairly easily by exporting them to CSV with Nokia PC Suite and then importing them into Google Contacts. Migrating SMSes was not so trivial however.

Other approaches

There are a couple methods floating around the web, but none were suitable. This one uses Gammu to retrieve the SMSes from the Nokia, and then a script to convert them to an XML format readable by SMS Backup & Restore. It turns out that Gammu doesn't work on Symbian S60v3 devices however. This script can convert SMSes exported by the MsgExport app to XML for SMS Backup & Restore, but I didn't feel like paying for it or dealing with the Ovi Store. VeryAndroid is a Windows application which can convert SMSes from Nokia PC Suite CSV format and sync them directly to an Android device, and Nokia2AndroidSMS can convert SMSes from the OVI Suite database to XML for SMS Backup & Restore. I didn't want to deal with more Windows software though, so I just decided to write my own.


I already had the Nokia PC Suite installed and was using it to migrate contacts, so I decided to work with the CSV output it generates for messages. A received SMS looks like this:

sms,deliver,"+16501234567","","","2012.06.13 19:13","","Leaving now"

and a sent SMS looks like this:

sms,submit,"","+16501234567","","2012.06.13 19:11","","Where are you?"

The fields are:

  • 0: "sms" for SMSes (MMS is presumably different)
  • 1: "deliver" for received messages, "submit" for sent messages
  • 2: Sender's phone number (blank for sent messages)
  • 3: Recipient's phone number (blank for received messages)
  • 5: Date and time in "YYYY.MM.DD HH:MM" format and local timezone
  • 7: Message body

Fields 4 and 6 are always empty for SMSes (they are probably used for MMSes, one being the message subject).

I also decided to generate XML for the SMS Backup & Restore app. The XML format looks like this:

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<?xml-stylesheet type="text/xsl" href="sms.xsl"?>
<smses count="1">
    body="Leaving now"
    readable_date="Jun 29, 2012 8:03:04 PM"

but can be reduced down to this:

<?xml version="1.0" encoding="UTF-8"?>
<smses count="1">
        body="Leaving now"

The attributes of the <sms> element are:

  • protocol: Always "0" (possibly different for MMS)
  • address: Sender or recipient phone number
  • date: Date and time in milliseconds since 1 January 1970
  • type: "1" for received message, "2" for sent messages
  • body: Message body
  • read: "1" if the message has been read
  • status: Always "-1"

The script

I implemented a script called [] in [Python] to convert one or more CSV files to this XML format.

./ received.csv sent.csv > android.xml

The XML file can then be transferred to the Android device (using USB or Bluetooth) and stored in /sdcard/SMSBackupRestore. It will then be presented as an option after selecting the Restore button in the app.

Ibid finally released!

After over a year of development, we have finally released Ibid. Ibid is a general purpose chat bot written in Python. We've suffered from a bit of feature creep, so despite being a 0.1 release it can talk 7 messaging protocols and has over 100 features provided by plugins. I think we also have an excellent architecture and very developer friendly plugin API. The 0.1.0 release can be downloaded from Launchpad or installed from our PPA.

Content negotiation with Lighttpd and Lua

Following on from yesterday's post, I decided to try implement proper content negotiation. After a fair amount of time spent getting to grips with [Lua][], I got a [script][] which works very nicely. It implements [server driven][] content negotiation for [media types][mime].

The basic idea of content negotiation is that a resource (e.g., this [graph][]) exists in multiple formats (in this case, [SVG][graph-svg], [PNG][graph-png] and [GIF][graph-gif]). When a user agent requests the resource, it indicates which formats it understands by listing them in the Accept header. The server compares these to the available formats and sends the best one. So a browser which can display [SVG][] will receive the diagram in SVG format, while a browser which can't will receive it in [PNG][] (or [GIF][]) format.

(The following description assumes knowledge of the Accept header format.)

The script works by searching the directory for files with the requested name but with an additional extension (each of which is a variant). The [media type][mime] is inferred from the extension using /etc/mime.types, and the quality of the type is set by a hardcoded table in the script. Each variant is checked against the acceptable types sent by the user agent, and the overall quality calculated by multiplying the quality with the q parameter in the Accept header. The variant with the highest overall quality is then chosen.

Some browsers include wildcard entries such as image/* and */* in the Accept header without specifying a q parameter. This parameter defaults to 1 (the highest value), which means that no preference is actually indicated. The script implements the same [hack][] that [Apache][] does in order to compensate for this. It also handles directory index files by defaulting to files named "index".

To install the [script][], download and save it somewhere (such as /etc/lighttpd/). Then add the following to the site definition.

magnet.attract-physical-path-to = ("/etc/lighttpd/negotiate.lua")

Serving static files without file extensions using Lighttpd and Lua

URLs shouldn't really contain file extensions (like .html, .png) since they are supposed to identify a resource and not a particular representation/format thereof. The format is indicated by the Content-Type header sent in the response. Modern CMSs do this already (for example, the URL of this page doesn't include .html).

Doing the same for static files (i.e. files served directly by the webserver) isn't straightforward because most webservers use the file extension to determine the MIME type to send in the Content-Type header. This means that simply removing the file extension from the filename (or even creating a symlink without a file extension) will cause the webserver to send the wrong Content-Type header.

I decided to try find a solution to this for my webserver of choice, Lighttpd. Lighttpd has a module which embeds a [Lua][] interpreter and allows you to write scripts which modify (or even handle) requests. So I wrote a [script][] which searches the directory for files with the same name as requested but with an extension. This means that any file can be accessed with the file extension removed from the URL while still having the correct Content-Type.

The script currently chooses the first matching file, which means that having multiple files with the same name but different extensions doesn't do anything useful. The proper method however is to actually do [content negotiation][], which chooses the format based on the preferences indicated by the HTTP client in the Accept header.

To use this script, download it and save it somewhere (I use /etc/lighttpd/). Enable mod_magnet, and add the following line to the site definition.

magnet.attract-physical-path-to = ("/etc/lighttpd/extension.lua")

Sharing links from Konqueror, including to IRC

I follow the main feeds of a couple social news sites (namely Digg, Reddit and Muti). When I find an article which I like, I go back and vote it up on the site. However, when I come across good articles via other sources, I don't submit them to these news sites (or try to find out if they've already been submitted) simply because it's too much effort.

When I started aggregating my activity on these sites on my blog and on FriendFeed, I needed a way to share pages that I didn't get to via one of these social news sites. I ended up setting up Delicious because I found a plugin for Konqueror which made it easy to bookmark pages.

I still wanted to solve the original problem though, and so started looking for an easy way to submit links to these sites from Konqueror. Konqueror has a feature called service menus which allows you to add entries to the context menu of files. I then needed to work out how to submit links to these services, which turned out to simply involve loading a URL with a query parameter specifying the link you want to share.

I created entries for Reddit, Digg, Muti, Delicious, Facebook and Google Bookmarks. These take you to the submission page of the service where you can fill in the title1. Digg and Reddit will show existing submissions if the link has already been submitted.

I often share links on IRC, and wondered if I could integrate that with my menu. It turns out that WeeChat has a control socket, and I could send messages by piping them to the socket. I therefore wrote a script which prompted me for a headline or excerpt using kdialog, and then sent the link to the specified channel. My menu now looks like this:


If you want to set this up yourself, download share.desktop and put it in ~/.kde/share/apps/konqueror/servicemenus. If you want the icons, download shareicons.tar.gz, extract them somewhere, and fix the paths in social.desktop2. To setup the IRC feature (assuming you're using WeeChat), download and save it in ~/bin/. You will need to change the commands in social.desktop depending on the servers and channels you wish to use.

  1. One shortcoming is that the title of the page is not automatically filled in. 

  2. I couldn't work out how to use relative paths, or ~. 

CLUG Political Compass graph

A couple people on #clug were updating their Political Compass scores, which prompted me to jump on the bandwagon and do the test. I came out with the following scores.

Economic Left/Right: -4.38
Social Libertarian/Authoritarian: -2.21

I then thought that it would be interesting to compare everyone's scores on a graph, so I wrote a [Python][] [script][py] to get the scores from Spinach and a [Gnuplot][] [script][p] to plot them.

CLUG Political Compass

To add yourself to the graph, tell Spinach your score in the following format. The graph is regenerated every hour.

cocooncrash.political_compass is -4.38 / -2.21 (2008/09/14)

Downloading Google Talk logs

I used Google Apps to host mail for this domain for a while, and wanted to close down the account since I don't use it anymore. Before I did that I wanted to move all the data onto my server. Transferring the emails was fairly straightforward using [POP3][], but I couldn't find a way to download the [Google Talk][] logs. [Gmail][] handles the logs as emails, but they aren't accessible using either POP3 or [IMAP][].

I therefore wrote a [Python][] script which downloads the logs via the web interface. On [Jeremy's][] [suggestion][] I used [BeautifulSoup][] to parse the [HTML][] this time, which worked very well. The script works with both Google Apps and normal Gmail, although my account got locked twice while trying to download the 3500 logs in my account.

Syndicate content