Eyal Estrin

Posted on Dec 11, 2023 • Originally published at eyal-estrin.Medium

Why choosing "Lift & Shift" is a bad migration strategy

#aws #azure #gcp #cloudops

One of the first decisions organizations make before migrating applications to the public cloud is deciding on a migration strategy.

For many years, the most common and easy way to migrate applications to the cloud was choosing a rehosting strategy, also known as "Lift and shift".

In this blog post, I will review some of the reasons, showing that strategically this is a bad decision.

Introduction

When reviewing the landscape of possibilities for migrating legacy or traditional applications to the public cloud, rehosting is the best option as a short-term solution.

Taking an existing monolith application, and migrating it as-is to the cloud, is supposed to be an easy task:

Map all the workload components (hardware requirements, operating system, software and licenses, backend database, etc.)
Choose similar hardware (memory/CPU/disk space) to deploy a new VM instance(s)
Configure network settings (including firewall rules, load-balance configuration, DNS, etc.)
Install all the required software components (assuming no license dependencies exist)
Restore the backend database from the latest full backup
Test the newly deployed application in the cloud
Expose the application to customers

From a time and required-knowledge perspective, this is considered a quick-win solution, but how efficient is it?

Cost-benefit

Using physical or even virtual machines does not guarantee us close to 100% of hardware utilization.

In the past organizations used to purchase hardware, and had to commit to 3-5 years (for vendor support purposes).

Although organizations could use the hardware 24x7, there were many cases where purchased hardware was consuming electricity and floor-space, without running at full capacity (i.e., underutilized).

Virtualization did allow organizations to run multiple VMs on the same physical hardware, but even then, it did not guarantee 100% hardware utilization – think about Dev/Test environments or applications that were not getting traffic from customers during off-peak hours.

The cloud offers organizations new purchase/usage methods (such as on-demand or Spot), allowing customers to pay just for the time they used compute resources.

Keeping a traditional data-center mindset, using virtual machines, is not efficient enough.

Switching to modern ways of running applications, such as the use of containers, Function-as-a-Service (FaaS), or event-driven architectures, allows organizations to make better use of their resources, at much better prices.

Right-sizing

On day 1, it is hard to predict the right VM instance size for the application.

When migrating applications as-is, organizations tend to select similar hardware (mostly CPU/Memory), to what they used to have in the traditional data center, regardless of the application's actual usage.

After a legacy application is running for several weeks in the cloud, we can measure its actual performance, and switch to a more suitable VM instance size, gaining better utilization and price.

Tools such as AWS Compute Optimizer, Azure Advisor, or Google Recommender will allow you to select the most suitable VM instance size, but the VM still does not utilize 100% of the possible compute resources, compared to containers or Function-as-a-Service.

Scaling

Horizontal scaling is one of the main benefits of the public cloud.

Although it is possible to configure multiple VMs behind a load-balancer, with autoscaling capability, allowing adding or removing VMs according to the load on the application, legacy applications may not always support horizontal scaling, and even if they do support scale out (add more compute nodes), there is a very good chance they do not support scale in (removing unneeded compute nodes).

VMs do not support the ability to scale to zero – i.e., removing completely all compute nodes, when there is no customer demand.

Cloud-native applications deployed on top of containers, using a scheduler such as Kubernetes (such as Amazon EKS, Azure AKS, or Google GKE), can horizontally scale according to need (scale out as much as needed, or as many compute resources the cloud provider's quota allows).

Functions as part of FaaS (such as AWS Lambda, Azure Functions, or Google Cloud Functions) are invoked as a result of triggers, and erased when the function's job completes – maximum compute utilization.

Load time

Spinning up a new VM as part of auto-scaling activity (such as AWS EC2 Auto Scaling, Azure Virtual Machine Scale Sets, or Google Managed instance groups), upgrade, or reboot takes a long time – specifically for large workloads such as Windows VMs, databases (deployed on top of VM's) or application servers.

Provisioning a new container (based on Linux OS), including all the applications and layers, takes a couple of seconds (depending on the number of software layers).

Invoking a new FaaS takes a few seconds, even if you take into consideration cold start issues when downloading the function's code.

Software maintenance

Every workload requires ongoing maintenance – from code upgrades, third-party software upgrades, and let us not forget security upgrades.

All software upgrade requires a lot of overhead from the IT, development, and security teams.

Performing upgrades of a monolith, where various components and services are tightly coupled together increases the complexity and the chances that something will break.

Switching to a microservice architecture, allows organizations to upgrade specific components (for example scale out, upgrade new version of code, new third-party software component), with small to zero impact on other components of the entire application.

Infrastructure maintenance

In the traditional data center, organizations used to deploy and maintain every component of the underlying infrastructure supporting the application.

Maintaining services such as databases or even storage arrays requires a dedicated trained staff, and requires a lot of ongoing efforts (from patching, backup, resiliency, high availability, and more).

In cloud-native environments, organizations can take advantage of managed services, from managed databases, storage services, caching, monitoring, and AI/ML services, without having to maintain the underlying infrastructure.

Unless an application relies on a legacy database engine, most of the chance, you will be able to replace a self-maintained database server, with a managed database service.

For storage services, most cloud providers already offer all the commodity storage services (from a managed NFS, SMB/CIFS, NetApp, and up to parallel file system for HPC workloads).

Most modern cloud-native services, use object storage services (such as Amazon S3, Azure Blob Storage, or Google Filestore), allowing scalable file systems for storing large amounts of data (from backups, and log files to data lake).

Most cloud providers offer managed networking services for load-balancing, firewalls, web application firewalls, and DDoS protection mechanisms, supporting workloads with unpredictable traffic.

SaaS services

Up until now, we mentioned lift & shift from the on-premise to VMs (mostly IaaS) and managed services (PaaS), but let us not forget there is another migration strategy – repurchasing, meaning, migrating an existing application, or selecting a managed platform such as Software-as-a-Service, allowing organizations to consume fully managed services, without having to take care of the on-going maintenance and resiliency.

Summary

Keeping a static data center mindset, and migrating using "lift & shift" to the public cloud, is the least cost-effective strategy and in most chances will end up with medium to low performance for your applications.

It may have been the common strategy a couple of years ago when organizations just began taking their first step in the public cloud, but as more knowledge is gained from both public cloud providers and all sizes of organizations, it is time to think about more mature cloud migration strategies.

It is time for organizations to embrace a dynamic mindset of cloud-native services and cloud-native applications, which provide organizations many benefits, from (almost) infinite scale, automated provisioning (using Infrastructure-as-Code), rich cloud ecosystem (with many managed services), and (if managed correctly) the ability to suit the workload costs to the actual consumption.

I encourage all organizations to expand their knowledge about the public cloud, assess their existing applications and infrastructure, and begin modernizing their existing applications.

Re-architecture may demand a lot of resources (both cost and manpower) in the short term but will provide an organization with a lot of benefits in the long run.

References:

About the Author

Eyal Estrin is a cloud and information security architect, the owner of the blog Security & Cloud 24/7 and the author of the book Cloud Security Handbook, with more than 20 years in the IT industry.

Eyal is an AWS Community Builder since 2020.

You can connect with him on Twitter

Opinions are his own and not the views of his employer.

The Ops Community ⚙️