Home

Search Posts:

Archives

Login

April 2007

S M T W H F S
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30

This is the first of several posts in which I will explore some of the features of ZFS.

Sun has, traditionally, usually relied on software RAID instead of hardware RAID. It's a matter of some question as to whether this is really the best option - in most x86 servers, hardware RAID seems to be highly desirable - but I've often wondered just why that is.

There are some theoretical performance gains to be had by moving to hardware RAID (at least, over a traditional software RAID, such as Linux's raidtools or Sun's Solstice Disk Suite), but I've never even observed them personally. Furthermore, with hardware RAID, if your whole box or RAID controller fails, recovering data from your array could be a huge hassle - hardware RAID can easily lock you in to a chipset or vendor, whereas software RAID is completely portable.

My own opinion is that hardware RAID is popular simply because it's "easy" - you can order a box from your vendor, select a RAID level, and you never even have to think about it. Your OS will see the RAID arrays as standard block devices, and everything will just work - whereas software RAID requires the sysadmin to configure things properly, and traditionally has been somewhat annoying to configure.

Enter ZFS and raidz.

First of all, ZFS is insanely easy to manage. For example, to create my RAID5 setup, I had to issue a single command:

jnthornh@coltrane ~ $ sudo zpool create vault raidz c0d0 c1d0 c2d0 c4d0 c5d0
jnthornh@coltrane ~ $ zpool status vault
pool: vault
state: ONLINE
scrub: resilver completed with 0 errors on Sat Apr 21 13:16:18 2007
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c4d0 ONLINE 0 0 0
c5d0 ONLINE 0 0 0
c0d0 ONLINE 0 0 0
c1d0 ONLINE 0 0 0
c2d0 ONLINE 0 0 0

errors: No known data errors

Easy enough, right?  Furthermore, my pool - in this case "vault" -  becomes immediately available and mounted as /vault.  I can instantly use the array without any further configuration.

This is, from my perspective, completely awesome.  No hassle dealing with PERC interfaces trying to set things up, no dealing with Linux RAID tools or solstice disk suite - it's all done.  It just works.

Of course, raidz provides other advantages - like on the fly checksumming and guaranteed data integrity even in the case of a power failure.  Traditional RAID 5 setups, should they use power during a write, can end up in an inconsistent state as some drives may have been written to and some had not.  Not an issue with raidz.

This is all quite neat, but the big question is this - what are your recovery options?

In order to test this out, I unplugged one of my drives and rebooted the box.  The volume came back up properly, read/write - but zpool told me it was in a degrated state.  I copied some files around, plugged the drive back in, rebooted, and then 'resilvered' the array (which took only minutes and a single command) - and everything was back to normal.  This is what you expect from RAID5, so this isn't really a surprise, but it's good to verify that things are doing what you expect them to.

Now, the real challenge - what happens if my OS is hosed?  I decided on a 6 disk setup, 1 crappy IDE drive for the OS and 5 drives for the raidz.  I would have liked to use raidz for everything, however a raidz root or /usr is still not supported by solaris, so I've got them on their own drive.  I decided to simply re-jumpstart on another crappy IDE drive should my existing one fail, a process that should only take a half an hour or so.

So, I tested it.  I rebooted and reloaded the OS (up to Solaris Express while I was at it - I wanted ISCSI target mode which didn't make it into 11/06) from my jumpstart server.  I logged in, and my raidz didn't show up - hmm, bummer.  After a quick 'man zfs' I discovered what I had to do - 'zfs -f import vault'.  Like magic, my zfs pool appeared, and was even mounted at the same mount points and shared via NFS with the same options.  That means if I lose the IDE drive, I should be able to fully recover in under an hour.

Add to all this the fact that raidz actually *outperforms* most hardware solutions (according to sun, anyway, there are optimizations to read/write that the OS can do that hardware controllers can't) and you've got a winner.  Quick performance benchmarks showed my raidz spanking single SATA drive performance for reads and about equaling it for writes in my case - which is really all I could ask for.

Gentlemen, behold!

MythTV Status Screen

jnthornh@coltrane ~ $ zfs list vault
NAME USED AVAIL REFER MOUNTPOINT
vault 619G 1.18T 48.7K /vault
jnthornh@coltrane ~ $ df -h /vault
Filesystem size used avail capacity Mounted on
vault 1.8T 48K 1.2T 1% /vault

I had a few delays, and some things didn't go as smoothly as I might have hoped, but my home file server now lives and is backing my MythTV box.

I've started sketching out some documentation on my wiki. I won't go into all of the details in this post - I plan to add more information later - but my biggest delays weren't even related to the storage box itself. In fact, I stalled for weeks just because I was too lazy to crawl around under the house and run some cabling.

I did take a good chunk of time to hammer out some failure recovery stuff once everything was working. The totally awesome news - if I reboot with a drive missing, everything works. If I re-jumpstart the box and do a 'zfs import', the raidz pool magically comes back (and is even mounted for me).

ZFS really does seem like magic if you're used to dealing with normal filesystems - more on that later.

Behold my awesome storage capacity.

Lookie at all that recording potential...