The Ops Community ⚙️

Cover image for Introduction to Day 2 Serverless Operations – Part 1
Eyal Estrin
Eyal Estrin

Posted on • Originally published at eyal-estrin.Medium

Introduction to Day 2 Serverless Operations – Part 1

In April 2023, I published a blog post called "Introduction to Day 2 Kubernetes", discussing the challenges of managing Kubernetes workloads in mature environments, once applications were already running in production.

In the software lifecycle there are usually three distinct stages:

  • Day 0 – Planning and design
  • Day 1 – Configuration and deployment
  • Day 2 – Operations

Serverless services are a cloud-native application development and delivery model where developers can build and run code without having to provision, configure, or manage server infrastructure themselves. Many cloud-native services are considered serverless – from compute (such as Function as a Service), storage (such as object storage), database, etc.

In this series of blog posts, I will review the common day 2 serverless operations.

Part 1 will focus on common operations for Function as a Service (FaaS), and part 2 will focus on application integration services.

Configuration and Revision Management

At this stage, you set the functions runtime version to be deployed, so you will be able to revert to a previous version in case of problems with the deployment or with your application.

Runtime engine updates

The base assumption at this stage is that the function was already configured and had its initial deployment, but as time goes by, there will be newer versions of the function runtime engine versions.

Although the recommendation is to use the latest stable version of the runtime engine, changing between major versions may require code adjustments and rigorous testing.

Security, Networking, and Access Control

At this stage, you configure network and security settings to protect your functions, before exposing them to clients.

This includes reviewing network access control lists, deployment location (inside or outside your cloud virtual network, according to resources the function needs access to), identity and access management (according to resources in the cloud environment that the function needs access to such as storage, database, etc.)

Audit and Compliance

At this stage, you need to make sure your functions automatically send their audit logs to a central system, combined with threat intelligent services that regularly review the audit logs, you can get alerted on security-related topics (such as anomalous behavior).

Monitoring, Logging, Observability and Alerting

Continuously track application health, performance, and security events using tools for real-time insights. This includes setting up dashboards and alerts to detect anomalies and issues before they impact users.

Error Reporting, Troubleshooting, Diagnostics and Debugging

Any running function will generate errors at some point, or you might need to troubleshoot or debug issues with running (or failed) functions. For this purpose, you need to collect errors and diagnostic logs from your functions and store them in a central service.

Implement error-handling strategies within your code (e.g., retries with exponential backoff) to minimize user impact during failures.

Scaling, Resource Management, Performance Tuning and Optimization

Analyze function performance metrics (duration/memory usage) to identify bottlenecks and adjust concurrency settings or provisioned capacity as needed for optimal resource utilization.

Summary

In this blog post, I presented the most common Day 2 serverless operations when using Functions as a Service to build modern applications.

Transitioning from traditional to serverless development can be challenging, but I encourage readers to keep practicing and gaining hands-on experience. Moving beyond the initial deployment to focus on ongoing operations and maintenance is crucial, and I hope the topics covered here will prove valuable for managing serverless environments in daily work.

In the second part of this series, we will deep dive into serverless application integration services, so stay tuned.

Additional reference materials

About the author

Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 25 years in the IT industry.

You can connect with him on social media (https://linktr.ee/eyalestrin).

Opinions are his own and not the views of his employer.

Top comments (1)

Collapse
 
areeba profile image
Areeba Nishat

Great intro to a topic that's often overlooked! Day 2 operations are where the real complexity begins—monitoring, debugging, cost optimization, and managing deployments at scale in a serverless environment. I appreciated the focus on observability tools and the importance of structured logging and tracing.

Looking forward to Part 2—hoping it dives deeper into incident response strategies and best practices for handling cold starts and throttling in production. Anyone else here already implementing automated health checks or anomaly detection in their serverless stack?