Using modern cloud environments, specifically production environments, decreases the need for human access.
It makes sense for developers to have access to Dev or Test environments, but in a properly designed production environment, everything should be automated – from deployment, and observability to self-healing. In most cases, no human access is required.
Production environments serve customers, require zero downtime, and in most cases contain customers' data.
There are cases such as emergency scenarios where human access is required.
In mature organizations, this type of access is done by the Site reliability engineering (SRE) team.
The term break-glass is an analogy to breaking a glass to pull a fire alarm, which is supposed to happen only in case of emergency.
In the following blog post, I will review the different alternatives each of the hyperscale cloud providers gives their customers to handle break-glass scenarios.
Ground rules for using break-glass accounts
Before talking about how each of the hyperscale cloud providers handles break-glass, it is important to be clear – break-glass accounts should be used in emergency cases only.
- Authentication – All access through the break-glass mechanism must be authenticated, preferred against a central identity provider, and not using local accounts
- Authorization – All access must be authorized using role-based access control (RBAC), following the principle of least privilege
- MFA – Since most break-glass scenarios require highly privileged access, it is recommended to enforce multi-factor authentication (MFA) for any interactive access
- Just-in-time access – All access through break-glass mechanisms must be granted temporarily and must be revoked after a pre-define amount of time or when the emergency is declared as over
- Approval process – Access through a break-glass mechanism should be manually approved
- Auditing – All access through break-glass mechanisms must be audited and kept as evidence for further investigation
- Documented process – Organizations must have a documented and tested process for requesting, approving, using, and revoking break-glass accounts
Handling break-glass scenarios in AWS
Below is a list of best practices provided by AWS for handling break-glass scenarios:
Identity Management
Identities in AWS are managed using AWS Identity and Access Management (IAM).
When working with AWS Organizations, customers have the option for central identity management for the entire AWS Organization using AWS IAM Identity Center – a single-sign-on (SSO) and federated identity management service (working with Microsoft Entra ID, Google Workspace, and more).
Since there might be a failure with a remote identity provider (IdP) or with AWS IAM Identity Center, AWS recommends creating two IAM users on the root of the AWS Organizations tree, and an IAM break-glass role on each of the accounts in the organization, to allow access in case of emergency.
The break-glass IAM accounts need to have console access, as explained in the documentation.
Authentication Management
When creating IAM accounts, enforce the use of a strong password policy, as explained in the documentation.
Passwords for the break-glass IAM accounts must be stored in a secured vault, and once the work on the break-glass accounts is over, the passwords must be replaced immediately to avoid reuse.
AWS recommends enforcing the use of MFA for any privileged access, as explained in the documentation.
Access Management
Access Management
AWS recommends creating a break-glass IAM role, as explained in the documentation.
Access using break-glass IAM accounts must be temporary, as explained in the documentation.
Auditing
All API calls within the AWS environment are logged into AWS CloudTrail by default, and stored for 90 days.
As best practices, it is recommended to send all CloudTrail logs to a central S3 bucket, from the entire AWS Organization, as explained in the documentation.
Since audit trail logs contain sensitive information, it is recommended to encrypt all data at rest using customer-managed encryption keys (as explained in the documentation) and limit access to the log files to the SOC team for investigation purposes.
Audit logs stored inside AWS CloudTrail can be investigated using Amazon GuardDuty, as explained in the documentation.
Resource Access
To allow secured access to EC2 instances, AWS recommends using EC2 Instance Connect or AWS Systems Manager Session Manager.
To allow secured access to Amazon EKS nodes, AWS recommends using AWS Systems Manager Agent (SSM Agent).
To allow secured access to Amazon ECS container instances, AWS recommends using AWS Systems Manager, and for debugging purposes, AWS recommends using Amazon ECS Exec.
To allow secured access to Amazon RDS, AWS recommends using AWS Systems Manager Session Manager.
Handling break-glass scenarios in Azure
Below is a list of best practices provided by Microsoft for handling break-glass scenarios:
Identity Management
Although Identities in Azure are managed using Microsoft Entra ID (formally Azure AD), Microsoft recommends creating two cloud-only accounts that use the *.onmicrosoft.com domain, to allow access in case of emergency and case of problems log-in using federated identities from the on-premise Active Directory, as explained in the documentation.
Authentication Management
Microsoft recommends enabling password-less login for the break-glass accounts using a FIDO2 security key, as explained in the documentation.
Microsoft does not recommend enforcing the use of MFA for emergency or break-glass accounts to prevent tenant-wide account lockout and exclude the break-glass accounts from Conditional Access policies, as explained in the documentation.
Access Management
Microsoft allows customers to manage privileged access to resources using Microsoft Entra Privileged Identity Management (PIM) and recommends assigning the break-glass accounts permanent access to the Global Administrator role, as explained in the documentation.
Microsoft Entra PIM allows to control of requests for privileged access, as explained in the documentation.
Auditing
Activity logs within the Azure environment are logged into Azure Monitor by default, and stored for 90 days.
As best practices, it is recommended to enable diagnostic settings for all audits and "allLogs" and send the logs to a central Log Analytics workspace, from the entire Azure tenant, as explained in the documentation.
Since audit trail logs contain sensitive information, it is recommended to encrypt all data at rest using customer-managed encryption keys (as explained in the documentation) and limit access to the log files to the SOC team for investigation purposes.
Audit logs stored inside a Log Analytics workspace can be queried for further investigation using Microsoft Sentinel, as explained in the documentation.
Microsoft recommends creating an alert when break-glass accounts perform sign-in attempts, as explained in the documentation.
Resource Access
To allow secured access to virtual machines (using SSH or RDP), Microsoft recommends using Azure Bastion.
To allow secured access to the Azure Kubernetes Service (AKS) API server, Microsoft recommends using Azure Bastion, as explained in the documentation.
To allow secured access to Azure SQL, Microsoft recommends creating an Azure Private Endpoint and connecting to the Azure SQL using Azure Bastion, as explained in the documentation.
Another alternative to allow secured access to resources in private networks is to use Microsoft Entra Private Access, as explained in the documentation.
Handling break-glass scenarios in Google Cloud
Below is a list of best practices provided by Google for handling break-glass scenarios:
Identity and Access Management
Identities in GCP are managed using Google Workspace or using Google Cloud Identity.
Access to resources inside GCP is managed using IAM Roles.
Google recommends creating a dedicated Google group for the break-glass IAM role, and configuring temporary access to this Google group as explained in the documentation.
The temporary access is done using IAM conditions, and it allows customers to implement Just-in-Time access, as explained in the documentation.
For break-glass access, add dedicated Google identities to the mentioned Google group, to gain temporary access to resources.
Authentication Management
Google recommends enforcing the use of MFA for any privileged access, as explained in the documentation.
Auditing
Admin Activity logs (configuration changes) within the GCP environment are logged into Google Cloud Audit logs by default, and stored for 90 days.
It is recommended to manually enable data access audit logs to get more insights about break-glass account activity, as explained in the documentation.
As best practices, it is recommended to send all Cloud Audit logs to a central Google Cloud Storage bucket, from the entire GCP Organization, as explained in the documentation.
Since audit trail logs contain sensitive information, it is recommended to encrypt all data at rest using customer-managed encryption keys (as explained in the documentation) and limit access to the log files to the SOC team for investigation purposes.
Audit logs stored inside Google Cloud Audit Logs can be sent to the Google Security Command Center for further investigation, as explained in the documentation.
Resource Access
To allow secured access to Google Compute Engine instances, Google recommends using an Identity-Aware Proxy, as explained in the documentation.
To allow secured access to Google App Engine instances, Google recommends using an Identity-Aware Proxy, as explained in the documentation.
To allow secured access to Google Cloud Run service, Google recommends using an Identity-Aware Proxy, as explained in the documentation.
To allow secured access to Google Kubernetes Engine (GKE) instances, Google recommends using an Identity-Aware Proxy, as explained in the documentation.
Summary
In this blog post, we have reviewed what break-glass accounts are, and how AWS, Azure, and GCP are recommending to secure break-glass accounts (from authentication, authorization, auditing, and secure access to cloud resources).
I recommend any organization that manages cloud production environments follow the vendors' security best practices and keep the production environment secured.
About the Author
Eyal Estrin is a cloud and information security architect, the owner of the blog Security & Cloud 24/7 and the author of the book Cloud Security Handbook, with more than 20 years in the IT industry.
Eyal is an AWS Community Builder since 2020.
You can connect with him on Twitter
Opinions are his own and not the views of his employer.
Top comments (0)