<?xml version="1.0" encoding="UTF-8"?>
<post>
  <body>I swear, my luck with hard drives is really rotten.  I just lost the OS drive in my MythTV box, and that marks the second time in as many years (and the 3rd time total).

It shouldn't be surprising.  I've got 8 drives in always-on systems, and I was sure to lose another eventually.  It's just too bad it wasn't one from the RAIDz array.

Anyway, the last time I lost the primary (and at the time only) drive in my MythTV system, I responded by rebuilding the thing with RAID 1.  It chugged along happily for a while with no issue.

At some point, I picked up a small form factor bare bones kit to replace the massive Dell tower that I had been using.  In moving to the smaller kit, I was forced to sacrifice the second drive.

Of course, now, I pay the price.

Luckily, the price isn't that high.  When I set up my RAIDz array a while back, I offloaded all of the actual media files onto that and exported them via NFS.  A drive failure in the mythtv system itself doesn't cause me to lose any of those.

At the same time, I also configured bacula to back up everything else &quot;important&quot; to the raidz pool as well, and I rsync those backups to an external drive.  This works remarkably well, and until now I've had no cause to use it.

I noticed the drive failure last night, when I tried to upload a newly ripped CD.  I didn't have time to do anything then - I just hit the gentoo website and started downloading the latest live CD (since god knows where I put my old one) and told bacula to restore everything to the local filesystem.

This morning, I got up a bit early and swapped out the failed drive with the one that used to be its mirror.  I briefly considered trying to recover a bootable system from the outdated mirror, but quickly thought better of it; the data was really stale and would have to be replaced anyway.  Might as well just nuke it from orbit and do a bare metal restore.

Once I had the live CD booted, it was pretty straightforward to recover from there.  The bacula restore job had finished the night before, so all I had to do was partition the replacement drive and rsync the backup over from the Solaris box.

Unfortunately, I had failed to backup the boot partition.  Not a big problem, but I had to go back in and recreate that, building a new initrd and creating a new grub.conf.  I also failed to create /dev/console and /dev/null on the actual / partition, which caused boot to fail until I went back and did so.  Lessons learned there.

I also lost my large &quot;scratch&quot; partition.  I tend to keep a collection of useless junk around, and in this case I had already decided that these things were acceptable losses in a recovery scenario.  In a way, it's actually nice to have this cleaned out.

The total time from cracking the case to having the system fully running with the prior night's backup was approximately 3 hours.  I know I'm probably not going to see 3 9's on my DVR, but that's not a bad turnaround time from my perspective.</body>
  <category-id type="integer">3</category-id>
  <created-at type="datetime">2008-12-23T14:11:09-05:00</created-at>
  <id type="integer">165</id>
  <name>drive-failures</name>
  <published type="boolean">true</published>
  <title>Adventures in drive failures</title>
  <updated-at type="datetime">2008-12-23T14:17:56-05:00</updated-at>
  <user-id type="integer">2</user-id>
</post>
