I've been a regular reader but quiet participant for quite some time. Thought I ought to share my current story and pass on my thanks for the amazing contributions on the board. I've found it a tremendous resource and chuckle at the tenacity and kindred spirit of so many that are seeking similar goals. I'm a software / tech guy with a fair amount of system build experience. I've dabbled in DAW over the years and hate noise in my computing environment. This has led me down many rabbit holes seeking high performance, large, fast storage and quiet components, a typical 3 options, pick 2, scenario. Add a budget, and it gets even more interesting. I've had many raid arrays in my past. Most have been raid 5 but I've also done some mirroring and striping for performance volumes. I've only had one fail, but it was catastrophic. Two out of 5 drives failed simultaneously taking the whole array down (The heads spluttered across the surface of the 1st platter in both drives). I was backing up the array. Turns out the backups were completing but I'd not tested a restore in quite a while and I could not restore successfully. Forensic cleanroom recovery couldn't even save much. It was an expensive and time-consuming lesson. I'm by no means RAID averse, just more backup careful. Which leads me to the ex495...
I was considering building a new multi-purpose server with a large RAID array when the e495 came out. My primary redundant central storage was in a 1TB RAID 5 array that was full. I was mulling a combo high performance server for dev dabbling and a larger storage pool. I didn't really want to spend the coin and I was reluctant to have the noise in my office. The ex495 went on sale and I couldn't resist. I picked up 4 2TB Samsung drives at the same time and began my WHS v1 journey. I put 3 of the 2TB drives in the storage pool and intended to swap out the system drive, but I never got around to it. FYI - the server name is ROCK. I believe in positive thinking
Aside from a few glitches with backing up Macs (never got it working myself), I've had a reliable life with the ex495 for almost two years. I've used it primarily for client PC backups and central storage, although I've dabbled with a variety of media related tasks. I have it setup for remote access, but I really don't use it that much. I'm backing up 6 PCs nightly. Some of them are pretty large. My physical 7.5 TB is nearly full.
Last December, I dove deeper into DVD and blu-ray ripping and the myriad of ways to organize and manage a digital video library. I installed MyMovies on the WHS. I also installed the iTunes helper, iHomeServer, to let me run iTunes as service on the WHS. ROCK was still stable, with only the typical odd items showing up in the logs.
Fast forward to a few weeks ago. My son's computer had a 1TB data drive fail. I didn't have a 1TB replacement and he was only using 1/3 of the space, so I installed a 500GB drive. I was surprised to find that I couldn't directly restore from the WHS PC backup to the new drive because the new drive was smaller than the old one. The restore was complicated because we had also moved his My Documents and such to the d: drive. (c: is small, fast and tight on space) Aside from the time wrestling with getting it to work, it didn't take too long to get him back up and running. I remember some grief with getting the backup reconfigured related to having a new sized d: drive, but the details escape me now.
Shortly after this recovery effort, I started finding my ex495 inaccessible on occasion. I could ping it, but I couldn't reach it via RDP or the WHS console (which was grey). Lights on the WHS didn't indicate a problem - they were all blue. The WHS wouldn't respond to the soft shutdown. I had to hard reset. Nothing in the event logs indicated a problem. Chkdsk in read only mode was clean. Events seemed to just halt around dawn with no indication of why. Hmm. I did the requisite google diving and forum scouring. I installed the SMART add-in, which did indicate that one sector had been remapped on the system drive. That's not great, but really not out of the ordinary. I ran Seagate's diagnostics on the system drive and it passes.
Despite the steps I was taking, the problem persisted and got more consistent. Not the right direction. I was still only mildly concerned, but I was annoyed and unhappy about having to hard reboot every morning.
About a week ago, I was playing with Adobe lightroom on my main machine, sending large collections of raw files to photoshop to create panoramas. I keep my photos on the WHS, so the process was taxing on the server, network and my PC. The combined image size approached 10GB. Of course, photoshop was trying to manage twice that much (the individuals and the combined) and lightroom was working with the catalog and the selected batch. It was pretty taxing on the swapfile. It got so slow that I walked away and let it run overnight. Of course, backups happen at night as well. I woke to find the server hung again and my panorama creation botched. Not good.
I dove more deeply into warnings and errors. I freed up a bunch of space. Nothing seemed to help. I started preparing for swapping out the system drive (the drive that had the sector remapped). I began researching the various options and processes, mulling over the best solution. I backed up critical items on the WHS to multiple places. I looked into ordering a Gen3 debug board / cable from Charles (finally pulled the trigger on that today!). In the past few days, I have had a couple of "The device, \Device\Ide\iaStor0, did not respond within the timeout period." errors in the event log. Not a good sign. I'm still not convinced that the system drive is really failing, but I'm not willing to bet on it.
As a last ditch effort, I did a whole bunch of things last night. I uninstalled most of the WHS add-ins (leaving the Toolkit 1.1, WHS BDBB and Advanced Admin). I disabled automatic backups of several clients, leaving only 3 backups operational (those that are critical or have been changing frequently). I made more free space on the WHS by deleting unneeded backups and such. I disabled SQL server (needed by MyMovies).
Guess what. I woke to find my server happily running and accessible. Excellent! Still, I'm uneasy, and I only have new clues, not answers. I am planning to proceed with the replacement of the system drive, but I'm going to try and move forward on understanding if there's a correlation between the last ditch efforts and the problem disappearing. My first tests will be to reenable the automatic backups of the other clients.
I have a couple of ideas about the problem that might be worth chasing / hashing out;
1) Perhaps I'm triggering a system condition that happens when a drive gets full, under load, during backup. My storage pool has been very low on free space. I've also found my WHS very low on system resources.
2) I have some suspicion that a SW situation is putting up a user dialog and getting stuck waiting for user input. I can't test this until I have the debug cable and a monitor connected.
This whole exercise has me focused on figuring out a better way to back up the WHS and also provide more space to the storage pool. These are open ended problems
. For a near term backup solution, I'm considering an external eSata RAID enclosure, running drives in mirrored mode, swapping out the mirrors and storing the mirrored set offsite. I'm a bit hesitant to make the investment based on the seeming finicky nature of the ex495 eSata. Has anyone had good long term results using an external eSata enclosure on the ex495?
Again, thanks to all of you for sharing your experiences and knowledge. Look forward to comments and questions.
PS - I wanted to add that I didn't intend to malign any of the SW or add-ins that I've referred to. Just trying to provide a good picture of the situation in hopes that it might help narrow down the problem.