The Ops Community ⚙️

Cover image for Securing Chaos at Scale Without Slowing Down
Eyal Estrin
Eyal Estrin

Posted on • Originally published at Medium

Securing Chaos at Scale Without Slowing Down

For a long time, I wanted to write a blog post talking about how the software development and cybersecurity world have matured over the years.

In this blog post, I will share my insights from organizations I came across in both the SDLC and cybersecurity domains.

Although traditionally you can break the maturity levels into 5 levels, I decided to make things simpler and break them into 3 levels (from ad-hoc to fully optimized). You can look at the various levels and figure out where your organization is on the scale.

Level 1 – Reactive

The initial level is considered the lowest maturity.

The infrastructure is deployed manually (usually based on physical or virtual machines), and if applications have UI, everything is configured in a click-ops style.

Code is written to meet immediate deadlines.

Standard workflows are just beginning to be documented.

There are almost no design considerations. Everything is ad-hoc, probably monolithic applications, and most likely undocumented architecture.

There is basic cyber hygiene. Protections are localized, perimeter-focused, and highly reactive. Security patches are deployed manually. Passwords are static or manually rotated at very long time intervals. MFA is rarely used, and auditing is either not configured, stored locally, not sent to a central logging mechanism, or not reviewed at all.

Resources are deployed with almost zero cost considerations (such as over-provisioned static resources, unmonitored cloud spend, etc.)

When looking at resiliency, in most cases, resources are deployed without redundancy (i.e., single point of failure), and in the best case, there are manual backups.

Level 2 – Structured and Managed

The second level is considered much more mature than the initial one.

Deployment is done based on containers, and very commonly on top of Kubernetes.

Automation is part of considerations such as deployment, testing, configuration, etc.

Engineering processes and CI/CD pipelines are standardized across the company.

Teams follow documented, repeatable workflows.

In terms of design considerations, we commonly see modular, service-oriented architecture and structured API design.

Risk assessments are routine. Security policies are formalized, and access controls are structured. It is common to see at this level implementation of SCA (Software Composition Analysis) and SBOM (Software Bill of Materials), as part of CI/CD pipelines, automated patching, and replacement of static passwords with passkeys or even passwordless authentication.

It is common to see design-stage secure coding reviews and manual STRIDE mapping.

When thinking about cost management, it is common to see the implementation of tagging strategies, scheduled resource scaling, and cost allocation.

When looking at resiliency, resources are deployed in multi-AZs, and disaster recovery plans are documented.
At this stage, we begin to see sustainability considerations, such as right-sizing underutilized compute instances.

Level 3 – Optimized and Proactive

This is considered a fully optimized level.

It is common to see cloud-native applications, use of microservices, event-driven patterns, and use of Serverless technologies.

We begin to see the implementation of GenAI technologies (such as AI agents, skills, etc.) and heavy use of non-human identities (for app-to-app communication or for AI agents).

Continuous threat hunting, automated incident response, and zero-trust verification for every entity.

It is common to see automated continuous Threat-Modeling-as-Code embedded in repositories. Threat modeling is dynamic, automatically updating when architectural changes are pushed to Git.

When thinking about cost management, it is common to see real-time anomaly detection and automated cost optimization policies.

When looking at resiliency, we begin to see the use of chaos engineering and active-active multi-region self-healing architectures.

In terms of sustainability, we see carbon-aware architectures and scheduling heavy batch jobs during peak renewable energy hours.

Summary

In this blog post, I have reviewed multiple pillars of modern architectural designs.

Perhaps your organization is graded high in some aspects (such as automation or security), while graded low on other aspects (such as cost management or sustainability).

Regardless of where your organization is located on the scale, there is always the next level your organization can mature in each pillar.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.

Top comments (0)