technical

Posts containing technical information.

Serving static files without file extensions using Lighttpd and Lua

URLs shouldn't really contain file extensions (like .html, .png) since they are supposed to identify a resource and not a particular representation/format thereof. The format is indicated by the Content-Type header sent in the response. Modern CMSs do this already (for example, the URL of this page doesn't include .html).

Doing the same for static files (i.e. files served directly by the webserver) isn't straightforward because most webservers use the file extension to determine the MIME type to send in the Content-Type header. This means that simply removing the file extension from the filename (or even creating a symlink without a file extension) will cause the webserver to send the wrong Content-Type header.

I decided to try find a solution to this for my webserver of choice, Lighttpd. Lighttpd has a module which embeds a [Lua][] interpreter and allows you to write scripts which modify (or even handle) requests. So I wrote a [script][] which searches the directory for files with the same name as requested but with an extension. This means that any file can be accessed with the file extension removed from the URL while still having the correct Content-Type.

The script currently chooses the first matching file, which means that having multiple files with the same name but different extensions doesn't do anything useful. The proper method however is to actually do [content negotiation][], which chooses the format based on the preferences indicated by the HTTP client in the Accept header.

To use this script, download it and save it somewhere (I use /etc/lighttpd/). Enable mod_magnet, and add the following line to the site definition.

magnet.attract-physical-path-to = ("/etc/lighttpd/extension.lua")

SSH agent forwarding and screen

I recently came across an article which tries to address the problem of SSH agent forwarding with screen. Briefly, the problem is that reattaching a screen instance breaks agent forwarding because the required environment variables aren't present in the screen instance. The solution given didn't quite work for me though because I use an SSH wrapper script which automatically runs screen.

My solution is to write a screen wrapper script which stores the environment variables in ~/.sshvars (as export statements) and then starts screen. Running source ~/.sshvars in the shell then makes the variables available. (I created an alias called fixssh to do this.)

I like to put wrapper scripts in ~/bin with the same name. This didn't work out the box however, since ~/bin is only added to PATH in ~/.profile but this file is only sourced if the shell is interactive. The fix is therefore to add the following to ~/.bashrc, but near the top before the [ -z "$PS1" ] && return line.

if [ -d "$HOME/bin" ]; then
    PATH="$HOME/bin:$PATH"
fi

Reduce spam by enforcing valid standards

One of the most effective anti-spam measures one can implement is to enforce valid use of SMTP and other standards. Spam clients are interested in sending messages as quickly as possible, and so usually don't bother with actually implementing standards correctly. In this post I shall describe the various checks which can be used, show how to implement these checks in Postfix, and describe how to ensure that your mail server passes these checks when sending mail.

Reverse DNS Entries

RFC 1912 states that "every Internet-reachable host should have a name" and "make sure your PTR and A records match". This can be checked by performing a Forward Confirmed reverse DNS lookup1. This check can be done before even accepting the TCP connection, which means the mail server's existence isn't even revealed to rejected clients.

Postfix: Add reject_unknown_client_hostname to smtpd_client_restrictions.
Passing: Ensure that your mail server has matching PTR and A records.

HELO/EHLO Hostname

RFC 2821 states that "a client SHOULD start an SMTP session by issuing the EHLO command". Almost all SMTP client implementations do do this, and so we can require the use of HELO/EHLO.

Postfix: Set smtpd_helo_required = yes.

RFC 2821 states that "the argument field [of the HELO/EHLO command] contains the fully-qualified domain name of the SMTP client if one is available". Since external mail servers have to be Internet reachable this is a requirement, and can be checked by looking up the name in DNS2.

Postfix: Add reject_invalid_helo_hostname, reject_non_fqdn_helo_hostname and reject_unknown_helo_hostname to smtpd_helo_restrictions.
Passing: Ensure that your mail server is configured to send a fully qualified hostname which exists in DNS.

If there is only one mail server (and possibly even if there are multiple servers), SMTP clients should not be using the server's hostname as the HELO hostname. Clients which do so can therefore be rejected.

Postfix: Add check_helo_access hash:/etc/postfix/helo_access to smtpd_helo_restrictions. Use helo_access as a template for /etc/postfix/helo_access, and run postmap /etc/postfix/helo_access afterwards.

Originator Address

The originator (MAIL FROM) address is where error reports will be sent, and therefore should be a valid address. The only thing which can be checked though is that the address is fully qualified and that the domain exists.

Postfix: Add reject_non_fqdn_sender and reject_unknown_sender_domain to smtpd_sender_restrictions.
Passing: Ensure that your mail server only emits fully qualified addresses. This should happen by default, except possibly for mail submitted with sendmail.

Recipient Addresses

Unless the mail server is a relay or backup MX, it should already only be accepting addresses for which it is the destination. If it is a relay or backup MX the same checks as above can be done.

Postfix: Add reject_non_fqdn_recipient and reject_unknown_recipient_domain to smtpd_recipient_restrictions.

One other check has to do with multiple recipients for bounced mail. Error reports for bounced mail uses a null originator address, and should only have one recipient.

Postfix: Add reject_multi_recipient_bounce to smtpd_data_restrictions.

Pipelining

Unless the client explicitly requests pipelining (as described in RFC 1854), the SMTP conversation must occur in lock step (i.e. the client must wait for a response from the server before sending the next command). Since spam clients are trying to send messages as quickly as possible it is likely that they do not adhere to this requirement.

Postfix: Add reject_unauth_pipelining to smtpd_data_restrictions.

RFC 2821 specifies that the server must send the first message after the connection is established. A neat trick is to delay this initial message to catch out clients which don't wait for it.

Postfix: Add sleep 1, reject_unauth_pipelining to smtpd_client_restrictions. This also requires smtpd_delay_reject = no (explained below).

Monitoring

Since these measures will reject valid mail from misconfigured mail servers, I like to keep an eye on rejections via logcheck. However, some of these measures by their very nature reject the client before it's even sent the originator and recipient addresses, which makes identification of valid mail difficult. Postfix therefore has a feature which delays rejection until after the recipient addresses have been sent. This is enabled by default, but can be disabled by setting smtpd_delay_reject = no.


  1. A reverse DNS lookup is done on the client's IP address, and a forward lookup then done on the resulting hostname. This forward lookup should yield the client's IP address. 

  2. Note that there is no required link between the HELO hostname and the client's PTR record. 

New Knab modules

I've recently been doing a lot of hacking on Knab, which is the software behind the #clug IRC bot, Spinach. I've contributed a number of new modules, most of which are running on Spinach and are available from the main Bazaar repository.

Events

This module is basically a calendar feature which can store and retrieve events such as birthdays. It also handles recurring events (both with rules1 or multiple dates).

<cocooncrash> Something happens on 21 December 2008 at 15:00
<Knab> yessir
<cocooncrash> When is something?
<Knab> something is on Sunday the 21st of December 2008 at 15:00:00
<cocooncrash> Forget event something
<Knab> I've forgotten something

<cocooncrash> My birthday happens on 22 March every year
<Knab> sure
<cocooncrash> How long until cocooncrash's birthday?
<Knab> your birthday is 13 weeks and 1 day away
<cocooncrash> How many days until my birthday?
<Knab> your birthday is 92 days away

<cocooncrash> When is Easter in 2010?
<Knab> Sun 4 Apr
<cocooncrash> What's happening on Tuesday
<Knab> National Chocolate Covered Anything Day and Day of Reconciliation
<cocooncrash> Search events for Outsider 
<Knab> Found: Be Friendly To Outsider Day, Be Hard On Outsider Day, Outsider's birthday, Outsider's dad's birthday,  (showing 4 of 4 results)

Twitter

This module retrieves messages posted on Twitter and identi.ca.

<cocooncrash> Last twitter update by iolproofreader
<Knab> "The ANC in the Eastern Cape is making it clear: it's either you're with us or not."
<cocooncrash> twitter 1067135354
<Knab> iolproofreader: ""Friend chases after golfer shot" http://tinyurl.com/4dn2lx"
<cocooncrash> http://twitter.com/iolproofreader/status/1067135354
<Knab> iolproofreader: "Police are planning on more patrols and more scrutiny for travellers crossing the border of Swaziland and Mozambique."

URLLengthen

This modules discovers the URL which a shortened URL redirects to.

<cocooncrash> http://is.gd/1D
<Knab> http://google.com/

URLShorten

This module creates a short URL using is.gd.

<cocooncrash> shorten http://google.com/
<Knab> http://is.gd/1D

HTTP

This module retrieves an HTTP URL and returns the result of the request.

<cocooncrash> get http://google.com
<Knab> 200 OK "Google"
<cocooncrash> Is google.com up?
<Knab> Yes, it's up
<cocooncrash> Is Twitter down?
<Knab> No, it's just you

GoogleDefine

This module gets definitions using Google.

<cocooncrash> gdefine spinach
<Knab> "dark green leaves; eaten cooked or raw in salads  ", "The Spinach (Eulithis mellinata) is a moth of the family Geometridae.

Subversion

This module retrieves commit messages from a Subversion repository.

<cocooncrash> last commit
<Knab> r721 by trb143 7 hours ago: "Update UI a bit to display columns"
<cocooncrash> commit 700 full
<Knab> r700 by mgorven on 2008/12/15 at 21:14:26 UTC: "Set non-existing config entries if requested with a default value, and create ~/.openlp when saving config file. " M /openlp-2/trunk/openlp/core/utils/linregistry.py, M /openlp-2/trunk/openlp/core/utils/confighelper.py

SummonJabber

This module summons people by sending them a message via Jabber.

<cocooncrash> cocooncrash.summon.jabber is cocooncrash@example.com
<Knab> gotcha
<cocooncrash> summon cocooncrash
<Knab> I've summoned cocooncrash via Jabber

Feeds

This module retrieves RSS and Atom feeds.

<cocooncrash> Add feed http://www.news24.com/news24RSSFeed/0,,2,00.xml as News24
<Knab> okay
<cocooncrash> latest articles from news24
<Knab> 0: "SA's first San party launches", 1: "Pakistan mall collapse: 3 dead", 2: "Iran warship to fight pirates", 3: "Westwood storms into lead", 4: "US Embassy given the boot", 5: "Jennifer Aniston's chilli Xmas"
<cocooncrash> article 0 from News24
<Knab> "SA's first San party launches" http://www.news24.com/News24/South_Africa/Politics/0,,2-7-12_2444493,00.... : The first San political party has been launched at Upington in the Northern Cape. 
<cocooncrash> article /Palin/ from news24
<Knab> "Drama in Palin household" http://www.news24.com/News24/World/News/0,,2-10-1462_2444295,00.html : The mother of  an 18-year-old man who plans to marry Alaska Governor Sarah Palin's  pregnant daughter, Bristol, has been arrested on drug charges.
<cocooncrash> list feeds
<Knab> m&g, clug park, haiybo, TechCrunch, onion, News24
<cocooncrash> remove news24 feed
<Knab> okay

  1. This feature is provided by the Date::Manip library. 

Playing with Python and IRC

I wrote three IRC bots in [Python][] this last week (although one was a rewrite). They probably aren't very useful to most people, but I'm going to share them anyway in case someone finds them interesting.

The first one was prompted by [Adrian][], who is maintaining a [countdown][] until his wedding as a factoid in [Spinach][]. Since [Knab][] doesn't actually support countdowns, it has to be updated manually. This clearly isn't the Right Way to do this, and so I hacked together a [script][irccountdown] which connects to IRC and teaches Spinach the updated factoid. I run this as a daily [cronjob][] to keep the countdown up to date.

As is usually the case with Python, there was already a library for accessing IRC, namely [irclib][]. It isn't documented very well, but has a couple example scripts which are fairly easy to follow. It follows an event based model, so you write functions which will be called when certain events occur (such as receiving a message).

The final of the [Currie Cup][] was held on Saturday (which my team (the [Sharks][]) won), and I followed the match online using [SuperSport's][] live score site1. I then thought that it would be cool to have the score announced on IRC when it changed, and since I was bored I wrote a simple [bot][rugby] to do this. It worked well, but was very simple in that it only supported one hardcoded channel and one hardcoded game.

Since I was also bored on Sunday I [rewrote][rugbybot] this bot properly. I added a subscription mechanism so that channels and users can subscribe and unsubscribe to games by sending the bot a command. It's mostly working except for listing the available games (since there aren't any rugby games coming up which means that I can't test it ;-) ). Games are specified by the ID used by SuperSport's site, and finding the right ID is currently a manual process.


  1. I'm not really a sports fan — I just enjoy bragging when we do win ;-) 

Sharing links from Konqueror, including to IRC

I follow the main feeds of a couple social news sites (namely Digg, Reddit and Muti). When I find an article which I like, I go back and vote it up on the site. However, when I come across good articles via other sources, I don't submit them to these news sites (or try to find out if they've already been submitted) simply because it's too much effort.

When I started aggregating my activity on these sites on my blog and on FriendFeed, I needed a way to share pages that I didn't get to via one of these social news sites. I ended up setting up Delicious because I found a plugin for Konqueror which made it easy to bookmark pages.

I still wanted to solve the original problem though, and so started looking for an easy way to submit links to these sites from Konqueror. Konqueror has a feature called service menus which allows you to add entries to the context menu of files. I then needed to work out how to submit links to these services, which turned out to simply involve loading a URL with a query parameter specifying the link you want to share.

I created entries for Reddit, Digg, Muti, Delicious, Facebook and Google Bookmarks. These take you to the submission page of the service where you can fill in the title1. Digg and Reddit will show existing submissions if the link has already been submitted.

I often share links on IRC, and wondered if I could integrate that with my menu. It turns out that WeeChat has a control socket, and I could send messages by piping them to the socket. I therefore wrote a script which prompted me for a headline or excerpt using kdialog, and then sent the link to the specified channel. My menu now looks like this:

sharemenu.png

If you want to set this up yourself, download share.desktop and put it in ~/.kde/share/apps/konqueror/servicemenus. If you want the icons, download shareicons.tar.gz, extract them somewhere, and fix the paths in social.desktop2. To setup the IRC feature (assuming you're using WeeChat), download postirc.sh and save it in ~/bin/. You will need to change the commands in social.desktop depending on the servers and channels you wish to use.


  1. One shortcoming is that the title of the page is not automatically filled in. 

  2. I couldn't work out how to use relative paths, or ~. 

Windows users finally get circular scrolling

When I first started using Linux four years ago, one of the most useful features I discovered was circular scrolling on touchpads. (For those that don't know, this allows you to scroll up and down by moving your finger in a circle.) Traditional scrolling now feels very clumsy, and I find it awkward when using a laptop which doesn't have this feature (such as those running Windows). According to the changelog for the XOrg/XFree86 Synaptics driver, this feature was added in February 2004.

I happened to come across the news today that Synaptics have added a feature called ChiralTouch Technology to the latest version of their Windows drivers. This so-called "technology" provides "the ability to scroll continuously with a circular motion." This basically means that they have finally gotten round to implementing a very useful feature which Linux users have had for over four years.

In some respects proprietary software is way behind FOSS in terms of features and usability, and this example also shows how proprietary software uses ideas which were first implemented in FOSS.

Vim syntax highlighting for irssi IRC logs

When I occasionally read [IRC][] logs saved by [irssi][], I find the lack of colouring rather annoying and find that I can't read them very quickly. I finally got round to writing a syntax highlighting plugin for [Vim][] in order to correct this. The colours could probably do with some improvement, but it's much better than before.

In case anyone else finds this useful, I have attached the plugin to this post. To use it, save [irssilog.vim][irssilog.vim] in ~/.vim/syntax/ and enter the following command to use it with the current file.

:set syntax=irssilog

If you want Vim to automatically detect the file type, add the following to ~/.vim/ftdetect/irssilog.vim1.

au BufRead,BufNewFile */irclogs*.log    set filetype=irssilog

  1. This relies on the logs being stored in the default location of ~/irclogs/

My personal backup solution

I've been using an external harddrive to store backups of my laptop for a while now. At first I manually created a set of compressed tar archives about once a month. That was a bad system though because it used a lot of space and was a mission to retrieve files from backups. I then started using pdumpfs, which can do incremental backups by hard linking files which haven't changed. The problem I found with it however was that if a file's ownership or timestamps changed it wouldn't be hard linked even if the content hadn't changed.

I therefore set out to find a better backup solution. My requirements were as follows.

  1. Incremental backups
  2. Easy to access specific files from backups
  3. Able to delete certain backups, preferably arbitrarily1
  4. Compression
  5. Encryption

I finally settled on storeBackup which supports everything except number 5. It works similarly to pdumpfs, except it stores ownership and timestamp data separately and therefore can still hard link identical files even if these change. It compresses on a per file basis, which makes it easy to access specific files (as opposed to having to find them in an archive). Old backups can be deleted arbitrarily since they are only related by hard links. I then added encryption by backing up to an encfs encrypted directory.


  1. I want to be able to backup every week, but then delete old backups so that I have one backup per month for the last year. 

Downloading Google Talk logs

I used Google Apps to host mail for this domain for a while, and wanted to close down the account since I don't use it anymore. Before I did that I wanted to move all the data onto my server. Transferring the emails was fairly straightforward using [POP3][], but I couldn't find a way to download the [Google Talk][] logs. [Gmail][] handles the logs as emails, but they aren't accessible using either POP3 or [IMAP][].

I therefore wrote a [Python][] script which downloads the logs via the web interface. On [Jeremy's][] [suggestion][] I used [BeautifulSoup][] to parse the [HTML][] this time, which worked very well. The script works with both Google Apps and normal Gmail, although my account got locked twice while trying to download the 3500 logs in my account.

Syndicate content