We’ve been running a large RDS instance in production for over year without a single moment of downtime. Recently, with a bigger budget and higher stakes, we decided to convert it to a multi-AZ instance, to improve reliability and redundancy of our data.
Since then (about two weeks ago), we’ve had two serious failures of our instance which required rebooting. There is nothing in the ‘recent DB events’. There were two DB connection spikes recorded in cloudwatch that occurred at the same time as the failures.
What’s going on?
It appears that the problems were related to AWS and not to the instance. When this was posted, it turned out to be one of the most serious outages of AWS europe so far.
There are a number of problems with RDS Multi-AZ that mean that it is still possible for automatic failover to not occur, leaving you without a server. Implementers should be aware and build sufficient redundancy into their applications that can mitigate problems in multiple availability zones.
- Does Amazon RDS support multiple databases per instance?
- How different is an Amazon EC2 RDS DB Instance different from the normal EC2 Instance?
- What is the data integrity of Amazon RDS? Do I need to make automated backups to S3?
- ASP.NET MVC multi-instance session management on amazon ec2
- How do I move an Amazon micro instance to a small instance?
Leave a comment
- Is there a way for administrators to disable users from installing Firefox extensions?
- Is there research material on NTP accuracy available?
- How to create a limited “domain admin” that does not have access to domain controllers?
- Can Windows RDC admin users be immune from being kicked?
- Domain Administrators account policy (After PCI audit)