Home

Search Posts:

Archives

Login

January 2014

S M T W H F S
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

I swear, my luck with hard drives is really rotten. I just lost the OS drive in my MythTV box, and that marks the second time in as many years (and the 3rd time total).

It shouldn't be surprising. I've got 8 drives in always-on systems, and I was sure to lose another eventually. It's just too bad it wasn't one from the RAIDz array.

Anyway, the last time I lost the primary (and at the time only) drive in my MythTV system, I responded by rebuilding the thing with RAID 1. It chugged along happily for a while with no issue.

At some point, I picked up a small form factor bare bones kit to replace the massive Dell tower that I had been using. In moving to the smaller kit, I was forced to sacrifice the second drive.

Of course, now, I pay the price.

Luckily, the price isn't that high. When I set up my RAIDz array a while back, I offloaded all of the actual media files onto that and exported them via NFS. A drive failure in the mythtv system itself doesn't cause me to lose any of those.

At the same time, I also configured bacula to back up everything else "important" to the raidz pool as well, and I rsync those backups to an external drive. This works remarkably well, and until now I've had no cause to use it.

I noticed the drive failure last night, when I tried to upload a newly ripped CD. I didn't have time to do anything then - I just hit the gentoo website and started downloading the latest live CD (since god knows where I put my old one) and told bacula to restore everything to the local filesystem.

This morning, I got up a bit early and swapped out the failed drive with the one that used to be its mirror. I briefly considered trying to recover a bootable system from the outdated mirror, but quickly thought better of it; the data was really stale and would have to be replaced anyway. Might as well just nuke it from orbit and do a bare metal restore.

Once I had the live CD booted, it was pretty straightforward to recover from there. The bacula restore job had finished the night before, so all I had to do was partition the replacement drive and rsync the backup over from the Solaris box.

Unfortunately, I had failed to backup the boot partition. Not a big problem, but I had to go back in and recreate that, building a new initrd and creating a new grub.conf. I also failed to create /dev/console and /dev/null on the actual / partition, which caused boot to fail until I went back and did so. Lessons learned there.

I also lost my large "scratch" partition. I tend to keep a collection of useless junk around, and in this case I had already decided that these things were acceptable losses in a recovery scenario. In a way, it's actually nice to have this cleaned out.

The total time from cracking the case to having the system fully running with the prior night's backup was approximately 3 hours. I know I'm probably not going to see 3 9's on my DVR, but that's not a bad turnaround time from my perspective.

New Comment

Author (required)

Email (required)

Url

Spam validation (required)
Enter the sum of 7 and 6:

Body (required)

Comments |Back