Tony Morris

Posted on May 26, 2022 • Originally published at blog.morriscloud.com on May 26, 2022

Don't Manage Terraform Enterprise With Terraform

#terraform #devops

Don't do it. I know you want to, and you should not.

First, some facts.

Terraform Enterprise is a wonderful product. It's the self-hosted distribution of Terraform Cloud for organizations that want the privacy and scale of an enterprise-grade installation.
There exists a Terraform Cloud/Enterprise Provider that can easily template and manage how an organization creates Workspaces (and other TFC/TFE resources).

Given the generally painful user experience of entering dozens of Workspace Variables into the TFE, it makes sense that nearly everyone I've worked with has stated a desire to use the tfe provider to manage workspaces. I'm here to tell you why this ends up being a rougher idea than you hoped for.

Pricing

Terraform Enterprise is priced by the number of Workspaces you have.

They're not cheap.

If you start dedicated Workspaces to creating and managing other Workspaces, you're effectively shorting yourself out of your own licenses.

On the other hand, if you're in Terraform Cloud, you're not paying per-Workspace, so feel free to use this method if the following hiccups don't pertain to you.

For what it's worth, if HashiCorp had a concept of "Configuration Workspaces" that didn't hit against your Workspace count, then this would obviously not be an issue.

Multi-Step Updates

Let's say you can get over the pricing issue. Talking through how the Configuration Workspaces would be architected and deployed brings up some other potential issues.

First, a quick overview of how we had tried this out.

The Setup

Workspace Repository

The most important piece of functionality here is the Terraform module that uses the tfe provider to create the Workload Workspaces.

Everything about the Workload Workspace is contained within this Workspace Repository, including the Workspace Variables.

Another name for this repository could be a "Configuration Repository," as it configures the implementation of the Workload Repository.

Workload Repository

The Workload Repository defines the Terraform resources to create the workloads that you are configuring. For us, this is a bunch of resources from the aws provider, but it could be anything.

Configuration Workspace

The Configuration Workspace in Terraform Enterprise is pointed to the Workspace Repository. When it executes a Run, it generates Terraform Enterprise resources, such as the Workload Workspaces and the requisite Workspace Variables in each one.

Workload Workspace

The Workload Workspace is created by the Configuration Workspace and pointed to the Workload Repository. When it executes a Run, it generates Workload-specific resources. In our cases, this is generally AWS resources such as EC2 instances, EBS volumes, etc.

The Problem

The biggest problem with this setup is the back-and-forth you have to do in order to make changes.

Let's say you want to add a resource to the Workload Repository. This is straightforward, and it doesn't cause many issues. You would just commit your changes to the Workload Repository, and the Workload Workspace would pick those up.

What if, however, that introduces a new variable to the repository? The changes you would have to make in order to get it through the system look something like this:

Add the Workspace Variable resource to the Workspace Repository.
Push the Run through the Configuration Workspace to add the TFE Workspace Variable to the Workload Workspace.
Add the variable to the Workload Repository.
You can finally push the Run through the Workload Workspace.

Four steps for a simple variable addition seems like a tough solution to roll out to any team.

What Should You Do Instead?

Let's be honest. I'm certain there are some really creative ways to work around these limitations managing Terraform at-scale. Feel free to use them and tell me what they are! Hit me up on Twitter if you find a really neat solution!

For our use cases, we are building out a Control Plane that abstracts that business-level functions from TFE itself. So, instead of thinking about "I need to create this TFE Workspaces with these Workspace Variables," we are now thinking "This team needs to create this application cluster."

This abstraction is not unique. I've talked to many people in the community that do a similar thing.

This solution works really well for us because we have a number of backend systems that we need to integrate together during an "application cluster spin-up." These include SaaS tools, such as PagerDuty, Splunk, New Relic, and many others. Given that Terraform Enterprise has a robust API, it makes our Control Plane much more straightforward to implement.

Top comments (2)

Jon • Jun 7 '22

Hi Tony, thanks for sharing.
The Control Plane approach is indeed a common approach I've seen in many place.
May I ask you what stack you used to build it?
Have you started from scratch or used existing orchestration tool or workflow engine?

Tony Morris • Jun 13 '22

Hi @jon, thanks for the question!

I have a full-stack development team currently building out the Control Plane application (and an underlying platform layer) with a number of technologies:

Angular for the front end UI
TypeScript for the back end API
AWS Step Functions for the orchestration
AWS Lambda for the functions themselves
AWS SAM for the serverless deployment model
Terraform for the AWS resource creation (outside of the main Step Functions)

I think that covers all the tech used within the application stack. It's all pretty custom to our business cases, so we started with the customizable AWS Step Functions with Lambda to handle most of the work.