I spent much of the weekend in sunny Sebastopol at the Pylons/TG2/WSGI sprint at O'Reilly's headquarters there.  One doesn't expect the lack of fanfare that marks the O'Reilly offices - besides a modest sign on entry to the parking lot, only a Tarsier statue made of recycled metal identifies the pretty normal-looking offices.

There's not much I can say that would probably be of interest to others other than I had a lot of fun, and the sprinters were all very friendly, great to chat to, geeky, and generally just like the great group of geeks we have in Cape Town.

My awesome carpool partner, Kelvin, not only managed not to go mad stuck with me in a car for an hour and a bit each way for two days - he was quite keen to show me around San Francisco on Sunday evening.  We did some common tourist things - went to the TransAmerica Pyramid, Union Square, past the Symphony Hall, through the Golden Gate Park and China Town, up Coit Tower on Telegraph Hill, down the "Crookedest Road", and generally meandering all over the place.

There were a few questions about the choice of Python as a language, and whether and what languages would come next, comparisons to other existing containers, and so forth.  Guido van Rossum said it was partly because Python is one of the three big languages at Google, and because it was (relatively) easy to harden the VM.  Kevin Gibbs said they had to start somewhere, and that they were committed to others.  Paul McDonald said that the two most voted-for issues on the issue tracker are language-related, and that there were teams (ie, more than one) currently actively working on languages (ie, more than one).

A couple of questions around "maturity" - the team says they'll make it clear when it is no longer a preview, and that this will probably happen when they have the billing set up and offline processing.  They expect billing to be available "toward the end of the year".

Question about HTTPS/SSL and access to encryption within GAE code.  Answer is that it's something they want to do, but don't know when they'll get to it.  Data is "strictly" partitioned between apps in the store (BigTable).

A common thread in answers were that the Google App Engine team were very interested in people being able to get their data and code out of GAE, and they're working on making it easy to bulk output the data.  They hoped that a standard would emerge for BigTable-like storage (CouchDB, SimpleDB) so that people could write code and host it on GAE or elsewhere.   And people are already working on compatible APIs to make it possible to run on other storage systems (but may not be too efficient).

This session deserves a much longer post, but I just wanted to put down the most interesting stuff quick.  Basically, a back-end developers guide of how Google is put together - from how a request that someone does in a browser gets a response to how those responses are put together from multiple sources and how those sources are built up.

Everyone knows Google's love of lots of commodity hardware for their servers, but it was interesting to hear some other things - reasonably low-end networking gear too.  Otherwise, that they've back where they started in terms of machines without cases shoved into in-house-designed racks.  The scale has changed dramatically, of course.

"If you have 10k servers, expect to lose 10 a day..."

GFS's masters are same server hardware as slaves - take part in master election like any other machine.  Google puts "millions" of pages together in a GFS "file", since it uses 64MB chunks.  200+ clusters, many of them 1000s of machines, pools of 1000s of clients.  4+PB filesystems, 40GB/s read/write load (even while HW is failing constantly).

MapReduce usage within Google is growing fast - 700 new applications in a recent month at peak, currently around 10k applications.  From 171k MapReduce jobs in March 2006 to 2.2 million jobs in September 2007.  MapReduce is very optimised to keep jobs near the data they need to conserve precious network speed within the datacentre.

Google still has one large shared source base(!), from low-level libraries used by anything to domain-specific libraries to applications.  Benefits are that it's easy to find examples of usage of something so you can use it correctly, and to reuse (ie, as a library).  Drawbacks being that such reuse causes some fairly tangled dependencies.

Language usage at Google: C++ for all high-performance, commonly-accessed web stuff.  Java is used for less-performance-oriented and/or lower-volume applications.  Python is used behind the scenes for things like configuration, administration, &c.

Some interesting news was delivered during the Google I/O keynote.

In terms of Google App Engine, the announcement that got the biggest applause was that it was now open to all signups - no waiting list and a few tens of thousands of developers.

Beyond that, the two new APIs were announced - the memcache API and the Image API.

Some pricing expectations for usage beyond the free chunk given to you were given:

  • CPU: 5 million "average" page views free, 10-12c per core-hour thereafter
  • Storage: 500MB free, 15-18c per GB-month thereafter.
  • Incoming traffic: 5 million "average" page views, 11-13c/GB thereafter
  • Outgoing traffic: 5 million "average" page views, 9-11c/GB thereafter

The Google Web Toolkit 1.5 release candidate was released today, which brings Java 5 language features.

In terms of OpenSocial, the 0.8 version specification was released yesterday, and that AOL has joined the OpenSocial initiative.

On Saturday, I'm heading off to San Francisco to attend Google I/O and also spend some time with my colleagues at SynthaSite in our US office.  Of most interest at the conference (at least in my personal capacity) is Google App Engine, but pretty much everything sounds interesting (with GWT being the big exception), and I can just imagine that making the decisions on what sessions to attend will be hard to do.  (And, you know, I guess I'm supposed to keep an eye out for things that might be useful to the company, or something...)

Over the weekend, I'll hopefully be heading to Sebastopol (in California Wine Country) for the Pylons/WSGI Sprint being held at O'Reilly Media's headquarters there.  There's two days of sprints, and I'm hoping to be there for most of both days - but it depends on travel arrangements.  If I get the time, I hope I can pop out and see a bit of the surrounding country and maybe one or two of those "places of interest".

In between the gatherings and travel, and before I head back, I'll spend time at the SynthaSite offices, doing what I'd generally be doing in Cape Town, but with better connectivity and less rainy cold winter.

If you want to catch me while I'm in San Francisco (or in London for the half-day I'll be there on the trip back) send me an email or leave a comment.

On Saturday (May 10th) the Cape Town Python User Group held a Python Sprint meeting as part of the Global Python Sprint weekend.  8 or so of us got together on and off from 10:30am until about 9:30pm at the SynthaSite offices around a table and worked through 10 or so issues in the Python issue database.

Thanks to The Other Neil and Simon for most of the organisation effort, and to them and Adrianna, Russell, Jonathan, Jeremy, Brad, and David for coming through and taking part.

And thanks to SynthaSite for coffee, coke, crisps, chocolates, and other goodies.

According to The Other Neil, we worked on:

Next week, on Tuesday 8th, the Western Cape Linux User Group meets to hear about dbus "and other freedesktop stuff".  Usual venue - Chemical Engineering lecture theatre at University of Cape Town.  18:30.  As is usual for CLUG meetings, everyone who attends is welcome to come have supper afterwards (there's a list of previous CLUG dinner venues on the wiki).

The next day, Wednesday 9th, the Cape Town Ruby Brigade has a meeting at the Bandwidth Barn from 19:00.  Currently known topics include the Yahoo! UI and working with it with Rails.  Don't forget to sign up

Saturday, 26th April, finds the Cape Town Python User Group Tenth Meeting (probably) at the Bandwidth Barn, probably from 14:00 as usual.  No set topics yet, but I imagine we might have a round of collaborative programming after the unplanned session last meeting which seemed to go down well.  (Unfortunately, I was working, so I missed out...)

Tuesday, 29th April, is the second of the twice-a-month meetings of the Western Cape Linux User Group.  No idea on the topic yet, though.

The GeekDinner Cape Town first birthday dinner is upon us - Garrulous Grape is our seventh GeekDinner (one year and three days after the first one), happening on Monday, 31st March from 7pm at Greens in Plattekloof.  Yes, we've finally headed north!

Before that, the Cape Town Python User Group meeting (aka CTPUG 9) at the Bandwidth Barn on Saturday, 29th March, from 2pm.

Tags: ,

Despite my best plans, PyCon 2008 in Chicago was not my first PyCon.  I've been following PyCon blog coverage over the years, and this year's is many times better than before.  And it's not a sudden increase in the number of blogs or new people - it's also that the same people are writing a lot more.

So, thanks to everyone who wrote about PyCon 2008, and hopefully I'll see you next year in Chicago.

Just randomly from the pages open from my aggregator, here are some posts:

One of my favourite South African open source enterprises is translate.org.za - which, amongst other great things, is behind two good pieces of (Python) software - Translate Toolkit (a library of converters between different translation formats) and Pootle (a web app for people to do translations through).

Those two pieces of software are potential targets for those entering Google Summer of Code 2008 - they're one of 175 organisations/projects chosen out of 500 applications.  And looking at the high-quality project ideas page they put together, you can see why their application was successful.

The translate.org.za people are also looking to hire a Python developer in Pretoria - I doubt there are all that many opportunities to work full-time on an open source project in South Africa (let alone in Python), so hopefully they'll find a good match.

This makes South Africa being represented as both student and mentoring organisation in Google Summer of Code (and, I'm guessing, there'll be a mentor from translate.org.za this year too), as well as a finalist in the Google Highly Open Participation Contest all in the past year and a bit...