Why ZFS Matters to Mac Users

This morning I read a summary at Think Secret of the leaked news that ZFS will be a supported filesystem in OS X 10.5:

New to build 9A321 is support for Sun’s ZFS file system, a 128-bit open source file system introduced with Solaris 10 that offers support for vastly larger drives and arrays than 64-bit file systems. ZFS also delivers additional options for administrators.

This description totally misses the point of ZFS, focusing on a number that means less than megahertz for CPUs. I imagine that most Mac users who haven’t been following ZFS development incorrectly assume that ZFS probably doesn’t matter to them unless they are running some sort of Xserve server farm. Nothing could be further from the truth.

Why ZFS Does Not Matter

Let me begin with why you should not care about ZFS. ZFS is often described as a “128 bit filesystem.” This is mostly true, but in day-to-day use, completely irrelevant. The upper bound for most personal and small business filesystems is on the terabyte scale, which current filesystems can contain with their puny 64-bit block pointers. In 5-10 years, we might care about petabytes or exabytes, so the ZFS developers were smart to future-proof the filesystem format and avoid the growing pains that systems like ext2/3 have had to endure. The ZFS team lead, Jeff Bonwick, famously noted that populating a storage pool with 2128 blocks of information would require as much energy as would be needed to boil the world’s oceans.

With that bit of vivid imagery, Bonwick is basically saying that the capacity problem is solved for the long term. Overcoming capacity limits is really not all that interesting though, since the solution is obvious: use bigger numbers. The real genius of ZFS is in all of its other design decisions.

Why ZFS Matters to Laptop/Desktop Users

People with iBooks, MacBooks, Powerbooks, Mac Minis, and iMacs all have generally the same storage setup: a single hard disk with capacity ranging from 40-500 GB. A lot of the magic of ZFS does not become manifest until you have several disks, but even with one, you can benefit in several ways:

Filesystems can be compressed. Unlike a compressed disk image, a compressed ZFS filesystem is read/write. Moreover, the compression flag can be turned on and off on the fly. New data will be compressed (or not) as per the flag, and old data will be left as is. Compressed filesystems are great for data that you don’t access very often, or data that compresses very well.

Filesystems are nested and making them is as easy as making a directory. This in itself is not very interesting for laptop/desktop users, but combined with compression, this means that you can effectively turn on compression for just a subfolder on your drive.

Every block of data on the disk is checksummed so errors can be detected during read operations. Many common hard drive failures are catastrophic, and painfully obvious when they happen. But it is possible for your data to be corrupted on disk in ways that you, and the hard disk, will never notice. While checksumming will not allow you to recover your data, it will let you know when you should go retrieve a file from your backup. (You are backing up, right? Go buy an external Firewire disk and SuperDuper!, and start doing it right now. It is easy, fast, and you’ll thank me later.)

Space-efficient and fast snapshots. A snapshot allows you to see your filesystem as it was some time in the past. ZFS is designed to snapshot a filesystem in constant time, no matter how much data you have, or how frequently you snapshot it. Moreover, the snapshot is very space efficient. Identical blocks are shared between snapshots and the live filesystem until they are written to. The space required for snapshots is therefore mostly a function of how quickly your files change, and not so much how often you make a snapshot. It’s like version control for your entire computer!

Apple’s much discussed Time Machine feature in OS X 10.5 is a great example of the interface possibilies when you have snapshots available. However, Time Machine does not appear to require ZFS, which means that Apple had to bolt snapshots onto HFS+, a complex and awkward task. Snapshots in ZFS are cheap and easy.

Why ZFS Matters to Workstation Users

With that list of features, ZFS already beats most other filesystems out there. But with a workstation like the Mac Pro, you can have up to 4 internal drives (8 if you get creative) and start to explore the multi-drive capabilities of ZFS. Traditionally, there has been a hard separation between the volume manager and the filesystem layer. The volume manager takes your many disks, and makes them look like one disk (with mirroring or striping or whatever) to the filesystem layer. The separation of duties ensures that the volume manager knows nothing about files, and the filesystem knows nothing about disks. ZFS, on the other hand, breaks down the barriers between filesystems and volume managers with some amazing results:

Automatically growing filesystems. Once you add your disks to the storage pool, all of their space is available to all of the filesystems you have. You can reserve space for a filesystem, to guarantee a minimum amount is available when you need it, and you can also set quotas. But these are just flags which are easy to change on the fly. The default for every filesystem is automatically expanding capacity up to the limit of your storage pool. There are no manual volume or filesystem resizing operations, ever.

Dynamic striping of file blocks over all drives in the storage pool. If you throw 2 drives in your storage pool, then files are automatically distributed over both disks, making large reads and writes faster. The disks do not have to be the same size (unlike usual striping configurations) and you can expand the pool whenever you want by installing a new disk. New files will stripe over old and new disks, and the old files will stay where they are. But, when you modify old files, the changed blocks are spread over all the available disks again. After adding a new disk, ZFS will get faster as you use the filesystem!

Software mirroring with automatic error detection and self-healing. ZFS also incorporates features traditionally left to software RAID drivers. You can arrange your disks into mirrored pairs (or triples, etc), which speeds up data reads, and also protects against single disk failure. Moreover, since ZFS checksums all data blocks, if one disk returns bad data, ZFS knows without having to query the other disk every time. Having identified the problem, it can then access the failed block from the other disk(s) in the mirror set and return to you correct data. ZFS then writes the correct data back to the original disk which failed the checksum. If the data error was a fluke due to some correctable problem, perhaps a bad sector (which modern drives can reassign to a new physical location) or just a bad write, then this will solve the problem. If the disk is really dead, then ZFS will take it offline and wait for you to replace it.

Fast resync of mirrors. In the unfortunate circumstance where a drive does die and you replace it, the resync process is faster with ZFS. This is because, unlike many other RAID systems, ZFS knows which blocks on the were used, and which blocks were not used. During resynchronization, ZFS only copies blocks with actual filesystem data on them to the new disk. So, if your disk pair was only half-full, then you are back in business twice as fast.

Software pairity RAID that actually works. The most popular pairity RAID system is by far RAID-5, where for every N-1 data blocks, there is one parity block. The parity block allows you to recover all your data if any one disk fails, much like mirroring, but without as much space penalty. There is a seldom discussed problem with RAID-5, known as the “RAID-5 write hole.” When modifying a single block, you have to rewrite all N blocks (including the parity block). If a power or hardware failure happens in the middle of rewriting these N blocks, then you effectively lose all N blocks of data, with no way to recover them. (Update: As pointed out in the comments, I have incorrectly stated how writes happen in RAID-5. Only the changed block and the parity block need to be updated, rather than all N blocks. Nevertheless, there is still a write hole if a hardware failure happens between the two writes.) You can fix this in hardware with battery backup systems, or RAID controllers with non-volatile write caches. The structure of ZFS is such that you can also solve the problem in software using a variant of the RAID 5 algorithm called RAID-Z. RAID-Z behaves much like RAID-5, but has no write hole. Recent ZFS releases have also added a double parity version of RAID-Z, which allows you to withstand 2 disk failures at once.

Why ZFS Matters to Server Admins

By now, I’ve hit on nearly all of the neat features of ZFS, but there are a few left that might be of interest to people with Xserve/Xsan clusters:

Easy command line interface. I have no idea how Apple will choose to present ZFS to users, but regardless, they have to include the fantastic zpool and zfs commands. These two commands make it very easy to manage lots of disks and filesystems.

A stream format which allows you to copy snapshots to other systems. This feature is a little hard to explain, but it basically allows you to dump a ZFS filesystem, preserving the snapshot history, and reload it on another system. This could be used for maintaining a backup server, or loading a filesystem into another storage pool.

Highly SMP-friendly design. ZFS is designed to efficiently support many, many processes all accessing a filesystem at the same time.

Nearly unlimited capacity and scalability. We come full circle back to the capacity issue. For servers which need to manage a large number of disks, ZFS scales pretty well up from the single-disk scenario we started with. Sun certainly pushes ZFS on their 48 disk monster, the Sun Fire X4500.

Waiting for Leopard

Hopefully, I’ve got you excited about ZFS coming to Mac OS X. So far, all we’ve seen is a leaked screenshot showing ZFS in the disk image creator. It’s not clear yet how much Apple wants to promote ZFS, via GUI interface tools, or integration with Time Machine, or just marketing. We’ll certainly learn more at Macworld 2007. Until then, take a look at this presentation on ZFS to learn more about it.

13 Responses to “Why ZFS Matters to Mac Users”

  1. David Magda Says:

    The presentation that you link to has an accompanying video at the “ZFS Learning Center”:

    http://www.sun.com/software/solaris/zfs_learning_center.jsp

  2. Daniel Says:

    Another advantage of the compression is since CPUs increased in speed so much more than even the fastest storage system you get a huge speedup by using a fast compression system in your cpu that sends much less data to the IO system. This results in a large performance increase in accessing easily compressible files.

    So if it takes 500ms to write a file, and you can compress it to 1/5 it’s size in 100ms, you can then write the file in 200ms (100ms to compress, 100ms to write). Free speed up!

    As you know, there are huge numbers of easily compressible files in OS X. Switching to ZFS may give a huge increase in day to day performance for free.

  3. CS Genius Says:

    Excellent article summarizing the benefits of a modern file system such as ZFS. Much like the stuff Network Appliance has been selling in their boxes for years.

    fwiw, most simple RAID-5 implementations do not rewrite all the blocks in a stripe when updating a single block. Rather, they read the old parity, old block, and with the new block, use that to compute the new parity. The new parity and new data are then written to disk. Thus, instead of writing the entire stripe, just 2 reads and 2 writes are issued. The write-hole problem, as described, still exists, though.

  4. C. Shamis Says:

    >People with iBooks, MacBooks, Powerbooks, Mac Minis, and iMacs all have generally the same storage setup: a single hard disk with capacity ranging from 40-500 GB. A lot of the magic of ZFS does not become manifest until you have several disks, …

    This is true, and it’s good you mention it up front; because I don’t think the “single hard disk” model is likely to change in the immediate future. In two or three years from now it may be the norm to have multiple HD’s in your laptop… but two to three years from now, ZFS will be in place… So what’s the hurry?

    >Apple’s much discussed Time Machine feature in OS X 10.5 is a great example of the interface possibilies when you have snapshots available. However, Time Machine does not appear to require ZFS, which means that Apple had to bolt snapshots onto HFS+, a complex and awkward task.

    Not really for somebody who understands the technology of file systems, and modern data-storage techniques. It’s really not any harder than “bolting-on” file-level encryption or compression. –Furthermore, it’s already been done, so why do I care how difficult it was? I don’t have to know HOW something was done in order to use it… (Isn’t that the whole point of owning a Mac in the first place?) :)

    >Automatically growing filesystems. Once you add your disks to the storage pool, all of their space is available

    Okay, this is REALLY COOL! No doubt about it. However, adding additional drives isn’t exactly an everyday occurrence. Sure, it *WILL* be nice one day, when… adding a hard-drive is like adding memory, you simply stick it in, and turn it on and the computer uses it… That will be *WAY* cool… but… It just isn’t something that happens that often… Besides, the complexity of adding storage to an existing file system can usually be mitigated by “nesting” the file system anyway. –Just move the files around and change the mount point. The OS doesn’t care if the “Whatever” directory is on hard drive X or hard drive Y.

    Oh sure… we humans may “care” which drive certain data is on, (Databases on one drive, Multimedia on another, etc…) but with ZFS you’ll have to cede that control anyway. This new “grow the file system” feature of ZFS isn’t going to let you decide what drive you put stuff… When you add a drive, ZFS will decide how to break out the data. (Just like the RAM in your computer, you don’t get to decide what programs run on which memory stick; well it will be the same thing with your hard-drive…) –I’m just saying… be prepared for that…

  5. Denver Computer Consulting Says:

    Thanks for posting such an interesting article about ZFS. I don’t think a lot of Mac users realize the power they have right under their noses.

  6. stan Says:

    The disadvantage to bolting on snapshots to HFS+ is that it likely is not as fast or as space-efficient as ZFS, which limits how much you can use it.

    Judgement on the matter will have to wait until we learn more about how Time Machine is implemented, though.

  7. huevo87 Says:

    Actually ZFS is not necessarily FASTER at restoring a disk in your pool if one fails and you swap it in for a new one. In the new one it will have to recompute all the parity of the missing information from the remaining disks. The problem arises when the disks were very full. The way ZFS stores data, using a shadow page tree, means that it will have to traverse the shadow page tree in it’s entirity, meaning that it will usually take longer than a RAID 4, for example. However, if the disks are only half full, then ZFS will be faster, because it will not recompute unnecessary parity blocks.

  8. Brian Says:

    I think some of the folks here are missing the point of the “grow your filesystem” feature. ZFS really throws some of the assumptions that we make about how filesystems work out the window.

    For example, home directories or even application folders can be treated as separate volumes. So if you want to compress & encrypt all user data or write protect specific applications, you can do that without clumsy bolt-ons like FileVault.

    Also consider the possibilities for backup… since ZFS natively supports replication & snapshots, the OS could automatically sync your volumes with things like firewire disks, other computers or even handheld devices (sync your home folder with your laptop or PDA) or even an online solution like .Mac.

  9. saha Says:

    I wish Apple updated the filesystem for Mac OSX. I know that HFS + with journaling and meta-data searching were added later to keep up with necessary features of a modern filesystem, but I feel HFS + is showing signs of age. I was hoping when Apple first developed Mac OSX it had used the UFS system and then made a separate HFS+ partition for people who wanted to use a Mac OS9 on the PowerPC based Macs, but that didn’t happen. Perhaps for the best at the time. Wilfredo Sánchez Vega wrote a whitepaper on the reasoning for HFS + at the time.

    http://www.wsanchez.net/papers/USENIX_2000/

    So now with the Intel Macs and no need for Mac OS 9 support, Apple could have told all their developers that all Universal apps must be able to run on UFS or case sensitive HFS+. That way should Apple decide to adopt ZFS it should be a painless transition. Holding on to HFS + with the Intel Macs for this long will hamper any transition into a future filesystem (incompatibilities dragging legacy habits of HFS+). This will prepare Adobe and Microsoft to write their new Universal versions to be able to accept any type of filesystem and not rely on the resource fork of HFS

    That’s my 2 cents.

  10. Hakime Says:

    ZFS is already largely implemented in Leopard. And to answer to your question, yes, the commands zpool and zfs already works. Actually pratically everything works at the command line level. Look here for a demo:

    http://themachackers.com/2006/12/19/zfs-on-mac-os-x-105-a-closer-look/

  11. Matt Povey Says:

    Just a quick point regarding compression. NTFS has had similar (though not as flexible) support for FS level compression since its inception. You generally find however that very few people actually use it.

    There are many good reasons for this but here are the biggies:

    1) The majority of space consuming formats are already compressed (MPEG etc.). These are Macs we’re talking about remember.
    2) Many of those formats in turn require relatively high performance (50Mbps for SD video editing - 25Mbps for DV - far more for HD). Compression can compromise that performance
    3) Hard drive capacities have grown so fast as to render compression pointless
    4) It’s a serious nightmare for admins who have to try to track both the compressed and putative uncompressed quantities of data that they store in order to plan storage requirements going forward
    5) Adding to admins woes is the loss of compression on their backup tapes (working in a media environment, I turn tape compression off anyway as it can actually increase the size of stored data!)
    6) ZFS compression would be great for USB sticks etc. Except that noone puts NTFS on USB sticks and neither will they put ZFS on them unles they’re only going to share with Mac\Solaris users.

    File system compression sounds nice but in practice it’s window dressing. ZFS is a massive step forward for MacOS but compression is just frippery.

  12. Ford Prefect Says:

    You should have left out the paragraph about how long it takes to rebuild a failed disk in RAID 5/Z mode. First, it is very assumptive. Next, there couldn’t be anything less concerning. From my own experience with failing disks in a Linux Software RAID 5, it takes about 45 minutes to rebuild a 240 GB array (ie, disk capacity 120 GB). While it is at work, you can do everything with the system, like watching a movie or just have it running in a productive state. Who cares if it runs 30 or 60 minutes?

    Anyway, it’s a nice summary about ZFS’ features from the end user’s point of view!

  13. stan Says:

    I will admit that my interest in ZFS compression is higher than perhaps warranted due to recent experiences. We have a large number of data files which easily compress (3x) that are individually rarely used, but need to be available to our analysis programs. Squashfs over loopback was the only option available to me, and it was pretty awkward, and essentially read-only.

    I also tend to carry around lots of reference files on my laptop where compression would be handy, rather than having to zip and unzip things constantly.

Leave a Reply

Entries (RSS)