I know how to scale my software, but how to prevent downtime because of network outages?
We are running rather large LAMP sites which scale well software wise. We use redundant load balancers in front of a bunch of webservers using MySQL via a proxy in master-slave-slave-slave.
We are using a very large US provider. They are not very cheap but not the most expensive either.
Last week there was a very large DDOS on their network and our cluster was affected; we lost network for a bit resulting in downtime.
What is the standard procedure to use 2 providers (for instance, one in EU and one in US)? I know how to do the software replication etc wise.
I’m wondering about the way data is sent to the EU network when the US one is down; is DNS the only choice for that? And if yes, how to set that up? Because switching DNS when the server is down seems too slow except when TTL = 0, which means we would be using DNS as a failover system. I understand (from Serverfault for instance), that this is not the preferred method of working.
So what is the preferred method of solving this with near 100% uptime (our cluster has that already, but the network doesn’t). Dropping like 1000 requests would be fine, but more is bad and should never happen.
Assuming I understand your question correctly, you want to have your customer fail over to a secondary data center if the primary is down for whatever reason. One product that can handle this is the BIG-IP Global Traffic Manager from f5 Networks. Essentially, it is going to immediately update your DNS when an outage is detected to start redirecting clients to the secondary network.
Another option may be to use something like Anycast to broadcast the routes to your data centers.
To add on to this question, we do operate in multiple data centers and in the end, decided that the best route was for an engineer to manually move DNS pointers to the alternate collocation depending on the reason for the outage. The worse case scenario is that we may be down 1 hour if one data center is completely offline. However, that is weighed against the impact of the customer when we do have to switch data centers (recent activity will not be available in the alternate location).
One final option is to not rely on your data center provide to give you IP connectivity and bandwidth. Instead, talk to a global IP provider like Global Crossing or Level 3 and let them handle routing your inbound traffic to either data center. The risk is that you are working with a single provider, but the benefit is that they can be much more flexible in their routing options (you can utilize MPLS on their network for your back-end replication, and also use the same connection for public IM connectivity).
Check more discussion of this question.
Related posts:
- What is a typical method to scale out a software load balancer?
- What to use for small scale server and network documentation? Must run on Linux
- How to build a Software-Network-Tester?
- What are the disadvantages of tree-based data center network design?
- Prevent unauthorized users from gaining network access?
Leave a comment
Recent Posts
- Cron expression that runs every 5 minutes from 1:30 am – 6:00 am [duplicate]
- Understanding redundant power supplies
- Is there a way for administrators to disable users from installing Firefox extensions?
- Is there research material on NTP accuracy available?
- How to create a limited “domain admin” that does not have access to domain controllers?





