Meglet Rambles On

it’s my website, I’ll ramble if I want to
02 Feb

When simple upgrades go very, very wrong

Today’s upgrade to the Windows Home Server was supposed to be very simple. It wasn’t really even an upgrade, all I needed to do was install the multi-drive racks in the case, and move one existing hard drive into one rack, to prepare for future upgrades. Since I was moving one hard drive, and I couldn’t tell which 500GB hard drive was the one I would be moving, I decided to play it safe and remove them both from the storage array, just in case WHS saw the one in the rack differently when I was done. So yesterday I removed the drives, which took about 4 hours per drive, and left me with only 92GB free out of about 3.8TB. Today, I shut the server down and prepared to install the racks. These are some really nice hot-swappable SATA drive racks, one puts 3 hard drives in 2 5.25″ drive slots, the other puts 5 hard drives in 3 5.25″ drive slots. The first thing I discovered was that the computer case has those handy mini-rails in the 5.25″ slots, to hold the drives. Those rails don’t work with the larger of the two SATA racks, so a pair of pliers and some elbow grease later, they were gone. The racks still wouldn’t slide into the case, but after some trying and re-arranging I figured out that they would work fine if I moved the DVD drive to the bottom, with the racks stacked on top of each other. Once the racks were in, I double-checked all the drive connections, installed the drive that was moving to the new rack, and booted the server back up.

Or at least I tried to. It posted, but wouldn’t give me a video display, and wouldn’t boot. A few resets and lots of waiting later, I tried a different monitor, and discovered that one drive attached to the RAID card was missing, so the boot process was stuck at the screen telling me the RAID was disabled. OK, fine. Traced out the cables again, found the Molex-to-SATA power converter I forgot to plug back in. Boot the server again, it blows right past the RAID like it should. Then hangs at the Windows 2003 Server splash screen. And hangs, and hangs, and hangs. Then flashes a blue screen at me and reboots. Rinse, repeat. Finally I get to a Windows Boot menu (this thing boots FAST until it hangs) and pick the option to disable reboot on error. 10 minutes later, it is done hanging and blue screens, with an “UNMOUNTABLE_BOOT_VOLUME” error. Not Good.

Find out there is no Recovery Console on the WHS install disk, Safe Mode is blue screening too, Last Known Good configuration is not working either. So I start digging around the internet. Hmm, looks like I can use a Server 2003 disk to launch Recovery Console. Fire up Recovery Console, poke around in the options. Chkdsk is telling me the drive has unrecoverable errors. FixMBR is telling me I could lose all partitions by creating a new MBR. That would be bad, as there’s about 750GB of data there.

So I decide to give the WHS reinstall option a try. This is supposed to reinstall the OS portion of WHS without wiping my data drives the way a normal install would. Boot to the CD, and it auto-selects the reinstall option. Nice. Load up the RAID drivers so that it will see all the drives when it boots and not freak out about missing storage drives. Wait, and wait, and wait, while setup copies files, and then reboots. Only to tell me that it can’t find the RAID drivers. Bother. Try again without the RAID drivers. Only, because setup didn’t complete, setup can’t tell that there is an existing installation. Which means I don’t get a repair option. Which means it will erase all data on all drives during install. This is Not Good.

So now I have a temporary WinXP installation on the box, copying all the data off the 2 750GB hard drives onto the 2 500GB drives that were originally removed from the array, and one spare 250GB drive. Estimated time remaining: 4 hrs 30 minutes. Then I have to disconnect the 500GB drives to hide them from setup, and install WHS again. Which will take another 1-2 hours. Then I have to install the RAID drivers, and copy all the data off the 1.82TB RAID disks onto the empty 750GB drives, because simply adding it back into the storage array would format it. Estimated time to copy data: another 3-4 hours. Then hook the 500GB drives back up, and copy their data back onto the server before adding them back into the storage array, another 3-4 hours estimated. And once all that is done, grab the external drive off my MacBook that has my iTunes library and copy that back over, since I don’t have quite enough space to save the Music folder along with everything else.

Total estimated time time to complete what should have been a 15-minute installation? 9-12 hours. Not counting all the time it will take to re-create all the custom shares, user names, re-install the connector software on the workstations, get the Add-ins installed again, and re-install and re-configure iTunes and connect the AppleTV back to it.

Next weekend, I will be borrowing the backup software from work to create a backup image of my server installation. All I need is the 20GB OS partition and the boot record for the primary drive, store it on a removable, bootable hard drive and I can have the system back up and running in 20 minutes.

Ladies and Gents, think very, very carefully about any drives you need to remove from your server, remove them one at a time, make sure you have enough space to back up everything else, and reboot between removals. And for crying out loud, back up your server installs.

Comments are closed.

© 2010 Meglet Rambles On | Entries (RSS) and Comments (RSS)

Powered by Wordpress, design by Web4 Sudoku, based on Pinkline by GPS Gazette