After having a bit of scare with a server that wouldn’t come up one morning, the higher ups have decided that the business needs a high availability / fail over setup.
We have 5 main servers (4x Linux, 1x OpenBSD) all of which need to be running for the company to operate. Three of the servers are fairly standard (Files/Web/Database), the fourth handles most network routing and web proxies, while the fifth supports our phone system and has non-standard hardware.
My boss has stated that turn around time for a server failure should be under 30 minutes.
My experience in this field is non-existent (I’m just a programmer who was ‘promoted’), so I guess my question really boils down to:
- Is this something that should even be attempted by someone with average server-admin skills. If so, what should I read, and who should I talk to?
I think you should start by getting numbers together to describe the cost associated with fulfilling the stated “requirement” to see if it even falls within the budget. If you’re not comfortable with all of the “normal” methods that would be used to fulfill the requirement (failover clustering, hypervisors with “hot migration” capability, etc), then you’d probably do well to find a consultant who can help out.
There’s going to be some cost associated with the feasibility study, but it’s going to cost a lot less to discover that a good solution won’t fit within the stated requirement (meaning that expectations need to be set more realistically by management– or they need to pony up more money) than it will cost to do something half-assed that ends up not fulfilling the requirement at all and blowing a ton of money in the process.
It sounds like your boss just pulled that number out of the air. Perhaps he’s done some analysis and knows what the cost-per-hour associated with downtime of various systems is, but I doubt it. It sounds like some pie-in-the-sky number that isn’t tied to reality. I’d be surpirsed if all your systems need that kind of availability. It may be, in the course of studying the business, that you discover that only a subset of functionality needs to have such a degree of uptime and fault-tolerance (and, thus, such a solution would ultimately cost less). I’m sure that phones and the line-of-business application are up there, but you may have some tolerance for downtime on some of the other systems.
My gut says that you’re probably going to find a win in using virtualization technologies to create a failover system based on migration of virtual machines between redundant hardware. Whether it’ll fit your budget or not will depend on your business, since you’ll definitely need some type of SAN to make that work effectively.
Don’t discount “traditional” failover clustering, though. There are definitely “wins” there, too, if your applications are well suited to such a configuration.
I wonder if your boss has thought about catastrophic failure scenarios (building burns, flood, tornado, theft, etc). If that’s not already planned-for, this would be a golden opportunity to work in some general business continuity planning and disaster recovery contingency.
Get some help from somebody who can come in and study your business and make recommendations. You won’t regret it.
- What should be known to integrate an IP-KVM in a small business?
- Does Windows Small Business Server 2008 include DFSR?
- Windows Small Business Server 2008 and Exchange 2010
- What are you implementing for communication in small business enviroments?
- Migrate Domain from Server 2008 R2 to Small Business Server 2011