The Plan
About a year ago we were informed that our data centre that served us and our customers well was being decommissioned, and a new one was being built in the neighbouring suburb. Knowing that a data centre move would not be an easy feat, our main focus was to create a detailed and integrated roadmap of the entire process. This was to ensure a successful migration while also minimising disruption to our clients’ critical business operations. With a goal of keeping downtime under 3 hours, which was going to be a challenging task in itself, we needed to play out every possible worst case scenario in order to implement a disaster recovery plan, just as we have done for the day to day operations of our business.
We had 2 main options to consider when we were planning the move.
The first option was to purchase all new hardware and place it in the new DC. After that we would complete a migration of data across a VPN tunnel from the current DC to the new one. Since we had just invested many dollars in new hardware months earlier, it did not make financial sense to purchase the new servers and perform a virtual migration, especially given the fact that these types of migrations may never go as planned with the large number of servers. Basically, it would have taken weeks to perform this virtual migration, still with downtime that ALWAYS happens in such a process. Even if it is only a minute long, it is still downtime.
The second option was to physically move our existing servers to the new DC. This was a more desirable choice since the servers only had to travel a few kilometres, and we would not need to purchase new hardware.
The Challenge
How do we move a fleet of servers and minimise downtime?
First thing we needed to do was inform our customers three months in advance about the move to keep everyone in the loop. Communication is fundamental throughout the whole process.
Secondly, we needed to create an inventory of what equipment would be moved to the new DC and a list of what equipment we would be sending to retirement. Eventually, we decided to only move the servers, leaving all switches, routers and other network related equipment at the old site. Why did we decide to do this? Since our primary goal was to minimise downtime and stick to our 3 hour maximum, we did not want to worry about fiddling around at either end with networking gear. Basically, we just wanted to disconnect at one end and plug back in at the other, and to do this, we actually invested thousands of dollars in new equipment. The new equipment included over 6300 meters of CAT6 cable, a collection of patch panels, switches and firewalls.
Third parties were involved with the move so we needed to simultaneously coordinate each other’s involvement. Many errors could have resulted from unprepared third party vendors, ultimately holding the rest of the move up as each vendor was dependant on the other.
We needed to adjust all cabinets and fit rails prior to the move. This was also a critical component as we did not want to be wasting unnecessary time during the move with such avoidable issues.
Choosing the best time to perform the move was tricky because we had customers who also had customers from all around the world. Therefore, there was no best time to perform the move as any time we chose would affect someone somewhere. As a result, we turned to our data graphs to look at our national and international traffic, and, based on our findings, the majority of traffic was Australian. With this in mind, we decided on the move’s commencement to be at 9pm on a Friday.
Once we had all of the above factors (and a few more not mentioned here) in place, we sent out another email to our customers thirty days and seven days prior to the move letting them know the date and time we had planned. This gave our customers enough time to prepare for downtime.
A few customers were not happy with the move because they had just spent a million dollars on advertising and did not want their site to go offline. They wanted us to do it when it suited them, but we could not keep everyone happy. We also had a few “smart” customers trying to tell us how to move DC’s without any downtime kindly providing links to googled search results of “how to move data centre without any downtime.” However we assured these people through detailed emails that we had explored every possible avenue for many months to set our current goals.
The Outcome
Moving DC’s was never going to be easy, but careful planning did pay off. I consider the move to be highly successful and could not have asked for a better outcome. Even though we did push 15 minutes past our 3 hour window, I did not lose too much sleep over it given the end result amidst such a challenging task.
Everyone involved in the move worked together harmoniously just as planned, and without the right people, this task would have been impossible. The move was successful because of quality and careful planning, which is something I always try to focus on, not just when moving DC’s.