The Ops Community ⚙️: Idan Asulin

Design Considerations for Cloud-Native Data Systems

Idan Asulin — Mon, 19 Dec 2022 17:50:43 +0000

When it comes to designing a cloud-native data system, there’s no particular hosting infrastructure, programming language, or design pattern that you should use. Cloud-native systems are available in various sizes and shapes. However, it is true that most of them follow the same cloud native design principles. Let’s take a look at the cloud native architecture, the design principles you should keep in mind, and the features that make up a good cloud-native platform.

Cloud native architecture

A cloud native architecture is essentially a design pattern for apps built for the cloud. While there’s no specific way of implementing this kind of architecture or a pre-defined cloud native design, the most common approach is to break up the application into a number of microservices and let each microservice handle a different kind of function. Each microservice is then maintained by a small team and is typically deployed as a container.

An overview of the cloud-native architecture (Source)

Let’s take a closer look at the architecture.

Embrace microservices
Cloud native design and development depends on a loosely coupled architecture, where different parts of the applications are developed, operated, and deployed independently. This is usually implemented using microservices.

It’s safe to say that microservices form the foundation of cloud-native systems, and you can really benefit from them by using containers that allow you to compress the runtime environment and its libraries, binaries, and dependencies into a logical and easily manageable unit. As a result, application services can be stored, duplicated, transported, and used as needed.

Unlike monolithic application, microservices comprise small independent services (Source)

The use of microservices (or loosely coupled architecture) is important for cloud computing for a number of reasons. For instance, it promotes simplicity, scalability, and resilience. Let’s take a closer look at how that’s possible.

With this architecture, you can break down complex applications into small independent parts, making the app development cycle simple and easier to manage. Not to mention, separating the app configuration and base code also makes developing and maintaining the app easier. Along the same lines, keeping the core application separate from the backing services allows the codebase to evolve and expand at its own pace.

Plus, it’s easier (and also faster) to scale up or down individual parts of one application instead of a whole monolithic app. Similarly, updating the app is easier since you have to update just the part (or microservice) that needs to be changed instead of deploying a new, updated version of the whole app again.

Embracing microservices also adds resilience and makes the app more reliable. If one component in a microservices architecture fails, the whole application won’t crash. It also promotes IaC (Infrastructure as Code) which, in turn, paves the way for automated deployment (which we’ll get to in just a bit). And finally, the microservice architecture involves the use of stateless processes and components via APIs, which isolates each microservice from others, leading to better security and efficiency.

To make sure that your application follows the loosely coupled architecture, you need to avoid making tightly coupled dependencies between the different parts. For instance, two microservices shouldn’t depend on the same database. If they do, you won’t be able to update and operate them independently.

Everything as Code
While it’s important to use microservices to benefit from modern applications, it’s also important to adopt automation practices. The purpose of this is to optimize the app development process and benefit both developers and users. For this, the ultimate goal is to achieve EaC – Everything as Code. Consider EaC as a step ahead of IaC, which comprises the app code base, infrastructure, and platform.

There are numerous benefits of this approach in terms of both hardware and software. For instance, it helps to implement version control at various levels and improves interdepartmental collaboration. It also facilitates the modularity of different components and enhances security via timely updates that help prevent vulnerabilities.

One key aspect of cloud-native data systems is the ability to implement automation at different levels using CI/CD tools. By adopting DevOps and agile principles, you can enjoy a number of benefits such as lower operational costs, better security, more flexibility, scalability, and fast cycle development.

Security, in particular, is very important. Manual handling often leads to attacks on cloud-native platforms, but implementing the best security practices via automation can really improve security. Plus, SecDevOps in CI/CD allows you to perform security testing in the early stages of the SDLC so that you can deal with the vulnerabilities early on in the development phase.

API-first approach
Developers are usually focused on code-first development instead of API-first development, but the problem is this approach is not the best for developing modern apps. For a cloud-native data system, you should encourage your developers to adopt an API-first approach and build software on top of that. Doing so will help save a lot of time and effort when laying down the basis for modern, distributed apps.

As we mentioned earlier, cloud-native data systems should follow the microservice architecture, where the services of an app are separated, and each service is executed as an autonomous application. As a result, individual microservices rely on APIs to communicate and interact with each other.

Keeping in mind the popularity of microservice architecture and modern applications, the importance of APIs is clear. Plus, an API-first approach allows developers to reap all the benefits of the microservices pattern. Apps that follow an API-first approach can be considered an ecosystem of interlocking services, where calls from applications and calls made by a user interface are considered API consumers.

There are numerous advantages of this approach. For instance, it makes a system highly scalable and reduces the chances of failure. It also cuts down development costs, improves the development experience, and increases speed-to-market by speeding up the development process. And in addition to facilitating communication between the user and the app via APIs, it also facilitates the automation and communication of internal processes.

Cloud native design principles

Cloud-native apps typically follow the principles defined in the 12-factor app framework and are built around security, resilience (and availability), elasticity, and performance (which includes scalability). Let’s take a closer look at these cloud native design principles.

Scalability
The idea behind scalability is to make it possible to add extra capacity to both the application and related services to handle the increase in demand and load. In particular, each application tier, how it can be scaled, and how bottlenecks can be avoided should be considered when designing for scalability.

There are three key areas to consider in this context: capacity, load, and data.

In terms of capacity, think about whether you’ll need to scale individual layers and if you can do so without affecting the app’s availability. You also need to consider how quickly you’ll need to scale services, and if you can scale down the app outside of business hours without affecting operations.

When it comes to data, think about whether you can scale keeping in mind the constraints of your services like transaction throughput and database size. Figure out how you can partition data to further boost scalability while staying within your platform constraints. Similarly, you need to figure out how you can use your platform resources effectively and efficiently.

And in terms of load, you need to determine how you can improve design to avoid bottlenecks and how you can use asynchronous operations to help with load balancing at peak traffic times. You also need to explore how you use the different rate-leveling and load-balancing features provided by your chosen platform.

One way to ensure scalability is to create automated processes that can scale, repair, and deploy the system as and when needed. You can set up the system such that it generates meaningful logs (and thus events) that you can then use as hooks for different automated activities. The resulting system should be able to automatically provision infrastructure such as machine instances, build, test, and deploy different stages in the CI/CD pipeline, and handle dynamic scalability and health monitoring and backup.

Many believe that cloud-native systems should be stateless, but this is quite difficult to achieve in real-world applications. However, since managing states is difficult to do in distributed applications, it’s better to use stateless components wherever you can. This is because stateless components make it easier to load balance, scale, repair, and roll back.

Availability
Availability refers to the ability of the system to be useful for the consumer despite faults in the underlying OS, hardware or network dependencies, or the app itself. Important principles include performance, uptime, disaster recovery, and replication.

When it comes to performance, you need to define the acceptable levels of performance, how they can be measured, and the actions or events that should be triggered when the performance falls below the acceptable levels. You also need to determine the parts of the app most likely to cause issues, and if a queue-centric design or auto-scale can help with that. Plus, you need to figure out if making some parts of the cloud-native system asynchronous can help improve performance.

Uptime guarantee is also important to consider. In particular, you need to define the SLAs that a product should meet and if it’s possible for your chosen cloud service to meet them. Meanwhile, in terms of disaster recovery, you need to determine how you can rebuild the cloud-native system in case of failure and how much data you can afford to lose in such a scenario. You also need to determine how you’re going to handle backups and in-flight queues and messages in case of failure, and figure out where you’re going to store the VM images and if you have a backup for that.

And finally, in terms of replication, you need to identify the parts of the system that are at high risk of failure and the parts that will be impacted the most by failure. Also, determine whether you need data replication and how you can prevent the replication of corrupt data.

Security
Security in cloud-native data systems is quite a broad topic and involves quite a number of things. But most importantly, you need to figure out:

The local jurisdiction and laws where the data is held, including the countries where metrics and failover data are held.
How you can secure the link between the cloud and corporate network if you have a hybrid-cloud app.
If there are any requirements that should be met for federated security.
How to control access to cloud provider’s admin portal, handle password changes, and restrict access to databases.
How you’ll deal with the vendor and OS security updates and patches.

Manageability
Manageability refers to the ability to understand the system’s performance and health and manage operations. In terms of the cloud, we need to consider two principles – deployment and monitoring.

When it comes to deployment, you need to ask yourself a few things. For instance, think about how you’re going to automate the deployment and how you can patch or redeploy without causing any disruptions to the live systems. Also, think about how you’ll check if a deployment was successful and how to roll back in case it was unsuccessful. Similarly, deployment also involves determining the number of environments you’ll need and how much storage and availability they require.

Meanwhile, when it comes to monitoring, you need to plan how you’ll monitor the app (are you going to use off-the-shelf services or develop one from scratch?) and where you’ll physically store the monitoring data. You also need to determine the amount of data your monitoring plan will produce and how you can access metrics logs. Similarly, ask yourself if you can afford to lose some of the logging data and if you’ll need to alter monitoring levels at runtime.

Feasibility
Finally, feasibility includes the ability to maintain and deliver the system despite time and budget constraints. Some things you need to consider for this principle are:

Is it possible to meet the SLAs? For instance, is there a cloud provider that guarantees the uptime you need to provide to your customer?
Do you have the necessary experience and skills in-house to build the cloud app, or will you need to hand it over to a third party?
What trade-offs can you accept and how much can you spend on the operational costs, keeping in the complex pricing of cloud providers?

Features of a good cloud-native data platform

You now know the principles and architecture considerations you should keep in mind when making a cloud-native platform. Let’s now look at some more features that a good platform should offer.

Benefits of a well-designed cloud-native platform (Source)

Cost-efficiency
It is true that there’s a big difference between the cost of fully managed cloud services and on-premises/self-managed services. However, the elasticity of the former and the pay-per-use model followed by most cloud platforms make it possible to run the right size without any resource (and, in turn, cost) wastage.

This also means you don’t need to worry about spending extra to pay for unused resources or even deal with capacity planning. Plus, because of the multi-tenancy of cloud platforms, it’s possible for service providers to price their service at a much lower cost as compared to self-managed services.

Pay for what you use
As mentioned above, most cloud platforms follow a pay-per-use model, which means you’ll only have to pay for the resources you use instead of the resources provisioned. These resources can be both high-level (like API get and put requests) and low-level (like memory or CPU usage). So, unlike the case with on-premises data, you don’t need to pay for the licensing cores that you might not use at all.

Elasticity & scalability
A good cloud-native platform also includes services that can be scaled up or down with a simple API call or a single click. It’ll be even better if the platform can scale the services automatically depending on the defined policies. Plus, because of pre-managed capacity planning and elastic scaling, only the most extreme cases will expose the scalability limits.

Availability
An efficient cloud-native platform is also defined by its high availability and is designed to handle most failures. Most platforms offer a service level agreement of a minimum of 99.95%, which translates to 4.5 hours of downtime in a year, but in reality, you can expect a higher availability.

Multi-tenancy
Multi-tenancy has two benefits – manageability and economy of scale – that most cloud-native services benefit from. You can provide the best user experience to your customers with services like S3 that delivers the service as queries or requests instead of CPUs, and all the tenants are so well isolated that users don’t know that other tenants are also served by the same physical system.

Yes, in some cases, users do have to buy dedicated computing resources like memory and CPUs (as is the case with AWS Aurora), but the underlying infrastructure like storage and network is still shared.

Performance optimization
Finally, to be able to serve different kinds of customer workloads, your system should be scalable in multiple dimensions. The constraints should be aligned and optimized across the whole infrastructure, including the hardware, OS, and application. Plus, in the case of a managed system, there should be a tight feedback mechanism with production. The system should also have the ability to analyze and learn from different scalability and performance-related incidents and roll out improvements to optimize performance.

Join 4500+ others and sign up for our data engineering newsletter

Originally published at https://memphis.dev by
Idanasulin Co-founder & CTO at Memphis.dev.

Stream Processing vs. Batch Processing: What to Know

Idan Asulin — Mon, 28 Nov 2022 11:01:27 +0000

Big data is at the center of all business decisions these days. It refers to large volumes of data generated through different sources and this data then provides the foundation for business decisions. The concept of data has been there for centuries but only now do we have enough computational resources to process and use that data. There are different ways through which we can process data. The two popular ways used for data processing are batch processing and stream processing. Let’s discuss each process in detail and understand their differences.

What is batch processing?

Batch processing is a method to process large volumes of data in batches and this is done at a specific scheduled time. Data is collected over a period of time and at a specific time interval it is processed and output data is sent to other systems or stored in a data warehouse. The size of data in batch processing is known.

In batch processing, input data come from one or more sources. Batch jobs run on a scheduled time and it depends on the data infrastructure of an organisation how frequently these jobs are running. In batch processing data is extracted from input sources and is transformed to prepare the data for analytics purposes or to feed it into a machine learning model. After transformation data is loaded into a data warehouse. The ETL procedure in batch processing is pre-defined and it doesn’t require any user interaction. Batch processing can also be triggered once data reaches a particular volume. The whole process of batch processing is automated using workflow orchestration tools like airflow, prefect, flyte, dagster, etc.

Batch processing is a commonly(change it) used approach used in designing data management infrastructure. Batch processing is a cost effective solution in case of dealing with large volumes of data. In batch processing jobs can be prioritized such that time- sensitive jobs are scheduled earlier and it gives additional advantage to manage resources. It can be executed offline to reduce load on machines. All the processes are automated which improves the data quality.

It permits organizations to rapidly deal with huge volumes of data. Since many records can be handled without a moment’s delay, batch processing speeds up handling time and delivers data so that organizations can perform analysis on that data.

Batch processing: Use-cases

Batch processing is used when we need to process large volumes of data to provide quick analytics results. The data processed is usually collected over a period of time and there’s no real-time data analytics required. It extracts data from data sources using complex scripts and efficiently manages resources to process that data. Batch processing is particularly useful in the following use cases:

Anomaly detection: In Anomaly detection, legacy data is used to detect outliers. In such algorithms, batch processing is used to extract and transform large volumes of data to detect anomalies.

Customer segmentation: It is used to run targeted campaigns and provides services to customers by processing historical data.

Payroll systems: Data related to employee salaries are collected and processed as a batch at the end of each month.

Banking system: Bank statements of customers are calculated at the end of each month or yearly in batches based on their subscriptions.

Billing services: The billing services use batch processing to generate invoices for customers at the end of each month.

Batch processing: Challenges

Batch processing comes with a few challenges that need to be addressed to design a scalable solution. Let’s look at a few challenges associated with batch processing.

Batch processes require human support for monitoring which makes the work mundane for humans and increases the cost of operations for organizations.
Debugging in batch systems is difficult. If a job fails then other jobs have to wait and it takes more time than expected.
Batch jobs run at a specific time so any change in data is delayed until the next batch is executed.
There’s often a delay in the availability of data to targeted systems like dashboards or machine learning models due to delays in batch processing.
Running multiple jobs at the same time often needs more efficient resource monitoring and management by the team.
The recovery from failure needs collaboration from different teams and it takes a long time.

-Large volumes of data are processed in multiple passes and need more time to deliver the results.

What is stream processing?

Stream processing refers to extracting, processing, and delivering data in real time. Stream processing is a stateless operation and users get insights in real-time. The data stream is generated continuously in real-time usually refers to data in motion and stream processing is used to process that data.

In stream processing, data is processed as soon as it arrives, the data stream is ingested into the system, and processing logic is applied to it. The processed data is delivered to real-time dashboards, machine learning models, data warehouses, and other systems. Stream processing is also used to generate alerts in case any errors are detected. Stream processing systems are optimized to process large volumes of data with minimum delay hence providing low latency.

It is nearly impossible to regulate or enforce the data structure or control the volume and frequency of the data generated in the modern world because data is produced from an infinite number of sources, including hardware sensors, servers, mobile devices, applications, web browsers, internal and external. Applications that examine and interpret data streams must sequentially process each data packet one at a time. In order to allow applications to operate with data streams, each data packet generated will contain the source and timestamp.

Stream processing enables organizations to analyze time-series data and identify patterns in them. For instance, data coming from websites is monitored to generate insights about users. In stream processing, the size of data is unknown and assumed infinite. The processing speed is just a few milliseconds and delivers fast output. It is beneficial for systems that require continuous data processing and need to take immediate action.

Companies will gain a competitive edge in their marketplace by being able to quickly gather, analyze, and act on current data. Organizations can respond to market changes, consumer needs, and business possibilities more quickly by using stream processing. This responsiveness can be a differentiating trait as the speed of business accelerates with digitization.

Stream processing: Use-cases

Stream processing is an ideal choice for systems when an immediate response is required when new data arrives rather than waiting to process the data at a specific time interval.

In cases when analytics results are needed in real time, stream processing is essential. Using platforms like spark streaming, Memphisdev, and Kafka you can design data streams to input data into analytics tools as soon as it is generated and get almost immediate analytics insights. Let’s discuss a few use-cases where stream processing is an ideal choice:

Fraud detection: In today’s digital age, online frauds are detected and fraudulent transactions are stopped in real-time. Stream processing is used to process data in real time and detect anomalies.
Sentiment analysis: Stream processing is used to ingest data in sentiment analysis systems. These systems are designed for data driven marketing in real time.
Log monitoring: Logs from multiple applications are monitored and errors are detected in real-time. Stream processing is used to continuously process the incoming logs data and results are delivered.
Customer satisfaction: Stream processing is used in analyzing customer behavior, digital experience monitoring and observing the customer journey to improve services.Customer feedback is a useful measure for evaluating an organization’s strengths and areas for improvement. A company’s reputation will improve the quicker it responds to consumer complaints and offers a solution. This speed pays off when it comes to online evaluations and word-of-mouth marketing, which can be the determining factor for drawing in new prospects and turning them into customers.

Stream processing: Challenges

Stream processing is a solution to modern data infrastructure but it has its own challenges. Let’s discuss where stream processing should be improved for robust solutions.

Scalability in stream processing is challenging when errors happen and pipeline malfunctions. The speed at which data is received can also increase which requires more resources to be added to ensure scalability.
In real world data is not always consistent and durable.The incoming data can be inconsistent and modified which makes it difficult for the data stream to process it.
The order of the data in the data stream must be determined, and it is crucial in many applications. It’s critical that every line be in the correct order when developers examine an aggregated log view to troubleshoot a problem. The order in which a data packet is generated and the order in which it arrives at its destination frequently differ. The clocks and timestamps of the devices generating the data frequently differ. Applications must be cognizant of their assumptions on ACID transactions when evaluating data streams.
Stream processing pipelines must ensure fault tolerance. Can your system prevent disruptions from a single point of failure with data flowing from various sources, locations, and in different forms and volumes? Can it maintain high availability and durability while storing streams of data? To build a robust solution, a data architect must answer such questions.

Batch processing vs Stream processing using Memphis

Batch processing extracts the data at a specific time and then applies transformations on it. The differences in schemas of incoming data sources during processing aren’t that significant and can be resolved in a data pipeline. Batch data pipeline travels between various data teams and collaboration becomes difficult. The tools available for batch processing are hard to learn and deployment is difficult as well.

In contrast stream processing systems have more than one data source each having its own schema different from others and its own requirements. Data is transformed and analyzed for each source in parallel. There are multiple target systems that request data simultaneously and it’s hard to troubleshoot if something goes wrong.

However, by resolving the challenges associated with stream processing an efficient data pipeline can be designed. One of the platforms that provides low-code solutions for stream processing is Memphis.dev. Memphis.dev is the only low-code real-time data processing platform that offers a full ecosystem for in-app streaming use cases using a produce-consume paradigm that supports modern in-app streaming pipelines and async communication by removing frictions of management, cost, resources, language barriers, and time for data-oriented teams, in contrast to other message brokers and queues that require extensive coding and time.

It provides support for maintaining and defining schemas, collecting data from multiple sources, and taking actions based on events. It integrates with a variety of other third party tools as well. Memphis.dev gives stream processing an advantage over batch processing.

Stream vs Batch processing comparison

In light of the above discussion, let’s compare batch processing and stream processing.

Conclusion

There is no technique that is always preferable in data processing. Depending on your project, batch and stream processing each offer advantages and disadvantages. Companies continue to lean on stream processing in an effort to maintain their agility.Batch processing is a favorable choice for companies with legacy systems. The choice of data processing technique depends on the internal data ecosystem of the company. Moreover, each project has different requirements so both techniques can be used in parallel. In real-world data teams are flexible and adaptable to new tools and techniques to improve data pipelines.

Join 4500+ others and sign up for our data engineering newsletter

Originally published at memphis.dev by Idan Asulin Co-founder & CTO at Memphis.dev.