A look back at the technical decisions behind our move to the cloud
5 years ago, we managed our operations from a UK based data centre in Manchester. Our hosting provider, Melbourne Server Solutions (now part of the UK’s largest dedicated hosting supplier, iomart) provided us with dedicated servers, firewalls, load balancers, network switches and more. We thought you might find it interesting what we learned, from our transition from a dedicated model, to hybrid, to now entirely cloud based.
Before transitioning to the cloud, we were actually some of the first proponents of cloud based infrastructure. As early as 2009 we were testing various parts of our infrastructure on EC2 — Amazon Web Services ‘Elastic Computing’ service. Anyone with a long memory will remember AWS’ incredible 4 day outage marathon back in April, 2010 that took out portions of Netflix and Twilio’s infrastructure. At the time, we were operating a hybrid model, splitting some traffic to our dedicated infrastructure in our Manchester data centre, and some to AWS’ data centres in the US. This outage (and another one in 2012) burned our fingers quite badly and it was an experience that, back then, resulted in the operational decision to close down our cloud infrastructure and focus on our dedicated hosting in the UK.
The primary reason, back then, for stepping away from cloud infrastructure, was the lack of support (and information) from the customer service teams at AWS. We’ve had many network and infrastructure issues with our dedicated provider over the decade we have contracted with them, but every time there was an issue, their support team were a phone call away. We made sure we had a well funded support contract and 4 hour SLAs on all our hardware and it meant that whenever there was an issue, we could speak to an engineer that could, if needed, physically visit our infrastructure, diagnose and repair any issues. At the time, we were paying a premium for this support, but in our formative years, our SLAs and reliability as a supplier were critical. AWS simply wasn’t ready for catering for an organisation of our size, with our requirements.
Following on from 2012, we moved from hosting a number of purely dedicated machines to running a virtual cluster in our data centre, using VMWare and then in the latter years, Cisco Xen. It was a managed service that had associated licensing costs for the hypervisor, but was the best of both worlds. The data centre provider managed the underlying infrastructure and host environment, and our operations team were free to play around with the guest layer, creating new virtual machines, virtual networks, storage arrays and so on. It was good preparation for when we eventually started considering cloud services again.
Managing our virtual cluster added to the complexity of our support and maintenance offering from the dedicated provider. At the same time, Melbourne’s unique selling point — their customer service focus — was sadly beginning to decline as part of a wider takeover by iomart. We assessed our infrastructure requirements moving forward and knew that running our own virtual machine cluster, using proprietary licensing was expensive and complex. More of the configuration responsibility was landing on our shoulders coupled with declining levels of support and increased cost. We decided to begin looking at the cloud again.
Nowadays, there are three main incumbents to choose from when considering Infrastructure as a Service — Amazon, Google and Microsoft. At the time, (2013/2014) Microsoft was still run by Steve Ballmer and the company’s cloud based offerings were basic and formative to say the least. We were familiar with AWS from our experiences in 2010. We were interested in Google’s investment in its cloud infrastructure and some innovation wins it had over AWS. A few attractive features at the time, such as portable network storage, truly global Load Balancing and an operating cost of roughly 2/3 of a comparable AWS deployment tipped the scales. We started experimenting with Google Cloud and began splitting traffic between our dedicated cluster and Google infrastructure.
One of the areas in which the Cloud beat our dedicated offering hands down was on flexible infrastructure. We had fixed cost infrastructure with our dedicated offering — switching to our cloud provider meant we could create a ‘burstable’ tier of computing power, where we only paid for what we used, down to the second. This was of great importance regarding our web tier (where we typically experience higher traffic during UK working hours, and much lower traffic at evenings and weekends). It also perfectly suited our planning technology and meant we could spin up large amounts of processing power to solve an optimisation problem for a number of minutes and only pay for what we use. We could pass this licensing model directly on to customers and gave us a competitive edge in our planning offering.
A final lesson learnt as we moved our entire infrastructure to the cloud is the massive benefit of shared scale. We now operate the majority of our infrastructure using containerisation technology. It allows even finer grained control over the resources we use and incredibly powerful flexing of compute resources. This type of technology — billed at a ‘virtual CPU’ per minute level is only possible from a provider the scale of Google, with their enormous amount of infrastructure and innovative design of compute resources. As time moves on, we are looking at more and more innovative cloud offerings (such as BigQuery and Spanner).
When building new products and solutions, we know that leveraging cloud computing offerings can actually afford a vast competitive edge. When building something new we unashamedly look to those cloud providers and ask ‘what can we leverage’. We believe part of our continued success is a lesson that we have painfully learnt over the last decade, and a lesson that is no truer than in our journey with moving to the cloud: don’t reinvent the wheel. Use the inventions of others as a platform for innovation — and do so shamelessly and publicly!
Part 3 of our Data Driven Work Management series
Part 2 from our series of posts extracted from our “Data Driven Work Management” Whitepaper.