Don't just double check, *triple* check!

in #fail7 years ago

A client was recently having issues with MySQL and ultimately we determined they needed more memory. Over the weekend, we deployed a new server to production and got everything ready for Monday morning.

After confirming everything was running as expected, I began preparations to shutdown and destroy the old database server. We wrote our own custom deployment software so we have some handy tools to make this all simple.

Before doing anything, I first created a snapshot, just in case. I logged into the production application to ensure it kept running and I shutdown the server. The application continued without issue so I was comfortable moving forward.

I carefully verified I had the right server, I clicked Destroy and our system gave me this fair warning.

destroy.png

Having checked and double-checked I destroyed the server and verified that the production application was working.

And then the truth set in. I just destroyed the production MySQL server for their website, not the internal application server I thought I was in. While I validated everything, I was in the completely wrong application.

It's for this exact reason that one of the steps in my well-defined destroy workflow is:

  • Create a current snapshot or backup of the environment being destroyed

Over the years, I've missed that step and have too many stories of lost data. Fortunately, I've learned from these mistakes to prepare, because on a Monday morning after a long weekend humans make mistakes. Myself included, sadly.

Now, you might think it's no big deal for the website to go down, but you'd be wrong. They spend around $10k per month on AdWords at around $5-7 click and since it's the middle of the day in the US this could get expensive quickly, but fortunately I quickly paused the campaign to stop the bleeding and with my snapshot we were back online in about three minutes.

Talk about a rough Monday morning!

The moral of this story: check to make sure that the check you are checking is the correct check to check. Then do that two more times.

Sort:  

That's a sad story, I'm a fan of red color as button in such sensitive cases will give more attention as well as 2 confirmation steps something like that.

Thanks, you make some great points to help prevent disaster. Hopefully this helps others to slow down and avoid the worst.

Coin Marketplace

STEEM 0.19
TRX 0.15
JST 0.029
BTC 62676.37
ETH 2581.43
USDT 1.00
SBD 2.72