Poor architecture decisions when migrating to the cloud

#aws #cloudops #azure #gcp

When organizations are designing their first workload migrations to the cloud, they tend to mistakenly look at the public cloud as the promised land that will solve all their IT challenges (from scalability, availability, cost, and more).

In the way to achieve their goals, organizations tend to make poor architectural decisions.

In this blog post, we will review some of the common architectural mistakes made by organizations.

Lift and Shift approach

Migrating legacy monolithic workload from the on-premises and moving it as is to the public cloud might work (unless you have a license or specific hardware requirements), but it will result in poor outcomes.

Although VMs can run perfectly well in the cloud, most of the chances you will have to measure the performance of the VMs over time and right-size the instances to match the actual running workload to match customers’ demand.

The lift and shift approach is suitable as an interim solution until the organization has the time and resources to re-architect the workload, and perhaps choose a different architecture (for example migrate from VMs to containers or even Serverless).

In the long run, lift and shift will be a costly solution (compared to the on-premises) and will not be able to gain the full capabilities of the cloud (such as horizontal scale, scale to zero, resiliency of managed services, and more).

Using Kubernetes for small/simple workloads

When designing modern applications, organizations tend to follow industry trends.

One of the hottest trends is to choose containers for deploying various application components, and in many cases, organizations choose Kubernetes as the container orchestrator engine.

Although Kubernetes does have many benefits, and all hyper-scale cloud providers offer a managed Kubernetes control plane, Kubernetes creates many challenges.

The learning curve for fully understanding how to configure and maintain Kubernetes is very long.

For small or predictable applications, built from a small number of different containers, there are better and easy-to-deploy and maintain alternatives, such as Amazon ECS, Azure Container Apps, or Google Cloud Run – all of them are fully capable of running production workloads, and are much easier to learn and maintain then Kubernetes.

Using cloud storage for backup or DR scenarios

When organizations began to search for their first use cases for using the public cloud, they immediately thought about using cloud storage as a backup location or perhaps even for DR scenarios.

Although both use cases are valid options, they both tend to miss the bigger picture.

Even if we use object storage (or managed NFS/CIFS storage services) for the organization’s backup site, we must always take into consideration the restore phase.

Large binary backup files that we need to pull from the cloud environment back to on-premises will take a lot of time, not to mention the egress data cost, the read object API calls cost, and more.

The same goes with DR scenarios – if we back up our on-premises VMs or even databases to the cloud, if we don’t have a similar infrastructure environment in the cloud, what will a cold DR site assist us in case of a catastrophic disaster?

Separating between the application and the back-end data-store tiers

Most applications are built from a front-end/application tier and a back-end persistent storage tier.

In a legacy or tightly coupled architecture, there is a requirement for low latency between the application tier and the data store tier, specifically when reading or writing to a backend database.

A common mistake is creating a hybrid architecture, where the front-end is in the cloud, pulling data from an on-prem database, or an architecture (rare scenario) where a legacy on-prem application is connecting to a managed database service in the cloud.

Unless the target application is not prone to network latency, it is always recommended to architect all components close to each other, decreasing the network latency between the various application components.

Going multi-cloud in the hope of resolving vendor lock-in risk

A common risk many organizations looking into is vendor lock-in (i.e., customers being locked into the ecosystem of a specific cloud provider).

When digging into this risk, vendor lock-in is about the cost of switching between cloud providers.

Multi-cloud will not resolve the risk, but it will create many more challenges, from skills gap (teams familiar with different cloud providers ecosystems), central identity and access management, incident response over multiple cloud environments, egress traffic cost, and more.

Instead of designing complex architectures to mitigate theoretical or potential risk, design solutions to meet the business needs, familiarize yourself with a single public cloud provider’s ecosystem, and over time, once your teams have enough knowledge about more than a single cloud provider, expand your architecture – don’t run to multi-cloud from day 1.

Choosing the cheapest region in the cloud

As a rule of thumb, unless you have a specific data residency requirement, choose a region close to your customers, to lower the network latency.

Cost is an important factor, but you should design an architecture where your application and data reside close to customers.

If your application serves customers all around the globe, or in multiple locations, consider adding a CDN layer to keep all static content closer to your customers, combined with multi-region solutions (such as cross-region replication, global databases, global load-balancers, etc.)

Failing to re-assess the existing architecture

In the traditional data center, we used to design an architecture for the application and keep it static for the entire lifecycle of the application.

When designing modern applications in the cloud, we should embrace a dynamic mindset, meaning keep re-assessing the architecture, look at past decisions, and see if new technologies or new services can provide more suitable solutions for running the application.

The dynamic nature of the cloud, combined with evolving technologies, provides us with the ability to make changes and better ways to run applications faster, more resilient, and in a cost-effective manner.

Bias architecture decisions

This is a pitfall that many architects fall into – coming with a background in a specific cloud provider, and designing architectures around this cloud provider’s ecosystem, embedding bias decisions and service limitations into architecture design.

Instead, architects should fully understand the business needs, the entire spectrum of cloud solutions, service costs, and limitations, and only then begin to choose the most appropriate services, to take part in the application’s architecture.

Failure to add cost to architectural decisions

Cost is a huge factor when consuming cloud services, among the main reasons is the ability to consume services on demand (and stop paying for unused services).

Each decision you are making (from selecting the right compute nodes, storage tier, database tier, and more), has its cost impact.

Once we understand each service pricing model, and the specific workload potential growth, we can estimate the potential cost.

As we previously mentioned, the dynamic nature of the cloud might cause different costs each month, and as a result, we need to keep evaluating the service cost regularly, replace services from time to time, and adjust it to suit the specific workload.

Summary

The public cloud has many challenges in picking the right services and architectures to meet specific workload requirements and use cases.

Although there is no right or wrong answer when designing architecture, in this blog post, we have reviewed many “poor” architectural decisions that can be avoided by looking at the bigger picture and designing for the long term, instead of looking at short-term solutions.

Recommendation for the post readers – keep expanding your knowledge in cloud and architecture-related technologies, and keep questioning your current architectures, to see over time, if there are more suitable alternatives for your past decisions.

About the Author

Eyal Estrin is a cloud and information security architect, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry. You can connect with him on social media (https://linktr.ee/eyalestrin).

Opinions are his own and not the views of his employer.

The Ops Community ⚙️