<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Michael Gorven - cocooncrash</title>
  <link rel="alternate" type="text/html" href="http://michael.gorven.za.net/blog/2008/08/17/downloading-google-talk-logs"/>
  <link rel="self" type="application/atom+xml" href="http://michael.gorven.za.net/node/45/atom/feed"/>
  <id>http://michael.gorven.za.net/node/45/atom/feed</id>
  <updated>2008-08-17T16:53:03+02:00</updated>
  <entry>
    <title>Downloading Google Talk logs</title>
    <link rel="alternate" type="text/html" href="http://michael.gorven.za.net/blog/2008/08/17/downloading-google-talk-logs" />
    <id>http://michael.gorven.za.net/blog/2008/08/17/downloading-google-talk-logs</id>
    <published>2008-08-17T12:01:14+02:00</published>
    <updated>2008-08-17T16:53:03+02:00</updated>
    <author>
      <name>mgorven</name>
    </author>
    <category term="beautifulsoup" />
    <category term="code" />
    <category term="google" />
    <category term="gtalk" />
    <category term="python" />
    <category term="technical" />
    <summary type="html"><![CDATA[<p>I used <a href="http://en.wikipedia.org/wiki/Google_Apps">Google Apps</a> to host mail for this domain for a while, and wanted to close down the account since I don't use it anymore. Before I did that I wanted to move all the data onto my server. Transferring the emails was fairly straightforward using <a href="http://en.wikipedia.org/wiki/Post_Office_Protocol"><abbr title="Post Office Protocol">POP3</abbr></a>, but I couldn't find a way to download the <a href="http://en.wikipedia.org/wiki/Google_Talk">Google Talk</a> logs. <a href="http://en.wikipedia.org/wiki/Gmail">Gmail</a> handles the logs as emails, but they aren't accessible using either <abbr title="Post Office Protocol">POP3</abbr> or <a href="http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol"><abbr title="Internet Message Access Protocol">IMAP</abbr></a>.</p>

<p>I therefore wrote a <a href="http://en.wikipedia.org/wiki/Python_(programming_language)">Python</a> script which downloads the logs via the web interface. On <a href="http://jerith.za.net/">Jeremy's</a> <a href="/blog/2008/08/06/cisco-un-clean-access#comment-13">suggestion</a> I used <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> to parse the <a href="http://en.wikipedia.org/wiki/HTML">HTML</a> this time, which worked very well. The script works with both Google Apps and normal Gmail, although my account got locked twice while trying to download the 3500 logs in my account.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>I used <a href="http://en.wikipedia.org/wiki/Google_Apps">Google Apps</a> to host mail for this domain for a while, and wanted to close down the account since I don't use it anymore. Before I did that I wanted to move all the data onto my server. Transferring the emails was fairly straightforward using <a href="http://en.wikipedia.org/wiki/Post_Office_Protocol"><abbr title="Post Office Protocol">POP3</abbr></a>, but I couldn't find a way to download the <a href="http://en.wikipedia.org/wiki/Google_Talk">Google Talk</a> logs. <a href="http://en.wikipedia.org/wiki/Gmail">Gmail</a> handles the logs as emails, but they aren't accessible using either <abbr title="Post Office Protocol">POP3</abbr> or <a href="http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol"><abbr title="Internet Message Access Protocol">IMAP</abbr></a>.</p>

<p>I therefore wrote a <a href="http://en.wikipedia.org/wiki/Python_(programming_language)">Python</a> script which downloads the logs via the web interface. On <a href="http://jerith.za.net/">Jeremy's</a> <a href="/blog/2008/08/06/cisco-un-clean-access#comment-13">suggestion</a> I used <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> to parse the <a href="http://en.wikipedia.org/wiki/HTML">HTML</a> this time, which worked very well. The script works with both Google Apps and normal Gmail, although my account got locked twice while trying to download the 3500 logs in my account.</p>
    ]]></content>
  </entry>
</feed>
