The Cape Town Linux User Group was lucky to get both a behind-the-scenes and front-end explanation by Johan Hartzenburg of ZFS - Sun Microsystems' new advanced all-singing all-dancing filesystem which is also a volume manager and, I'm sure will eventually be able to send email before becoming Emacs.
Johan explained to us how ZFS manages to always be consistent - by never editing existing metadata entries, but rather copying the entry to a new entry, editing the new entry, and then replacing the link of the original entry's parent to the new entry. But, of course, because it never edits an entry directly, the parent goes through the same process, until it reaches the uberblock. The uberblock never has a new copy created of itself, but there are multiple copies of it, and updating the uberblock is an atomic operation. Even if things go awry while this is half-complete, any of the uberblocks is consistent (and, I think it has a timestamp to fall back on).
This all sounds really inefficient, but ends up not being so. The new blocks are generally all written near each other, making a whole bunch of random writes actually often be more efficient by having all the new data and metadata all be written near each other.
Unused metadata and data blocks are then removed.
Using this design makes snapshots pretty trivial - since all you need to do is not delete the original metadata and data blocks used in the snapshot. Everything speeds on ahead, and the scrubber just doesn't free those blocks.
Also, using this design makes changing on-disk options pretty simple. This includes, for example, how ZFS can efficiently handle different byte orders. On read, ZFS can handle either order, but on write will always use the most efficient byte order. Similarly, compression can be used on a data block level - every time a change happens to a file, it can compress the new data block that is created.
This also includes how to bring more members into the pool and harness the increased I/O bandwidth. The "allocator" just needs to allocate the new blocks created to be edited to the new member, and as time goes by, all members of the pool naturally tend to have equal amounts of data, and thus maximising bandwidth in concurrent read or write requests.
The command line tools are incredibly simple and powerful, and with ZFS you don't have to worry about device renaming, as it records on the disks all the information necessary to find out where in the ZFS hierarchy that disk lives. Easy to use, and hard to screw up? How can it possibly succeed?
Solaris servers with Xen as Dom0 (which seems to be progressing well) with a massive ZFS storage pool and multiple virtual machines just so sounds like a winning plan. Or FreeBSD, once ZFS-on-FreeBSD (going well, I see) and Xen-Dom0-on-FreeBSD (not quite as encouraging) are available in stable forms.