Alex Eversmeyer

Posted on May 26, 2022

CI/CD: Branch-based Terraform Deployment

#devops #cicd #terraform #github

This is cross-posted from my Dev.to blog!

For my Skyboy project, I chose to use Terraform to provision the application's infrastructure on Amazon Web Services (AWS), both because Terraform is already familiar to me and because I wanted to practice coding a more complex modular configuration. This decision led to several challenges and lots of good learning!

Terraform Modules

With a very simple set of resources, it might be appropriate to limit a Terraform configuration to one directory and the usual set of files (main.tf, providers.tf, variables.tf, and so on). This project, however, would require several different categories of resources: a VPC; an ECS cluster, service, and task definition; some IAM roles and permissions; and a load balancer.

I broke up these categories into a directory structure like this:

terraform/
  - containers/
      - task-definitions/
      - main.tf
      - ...
  - iam/
      - main.tf
      - ...
  - loadbalancing/
      - main.tf
      - ...
  - vpc/
      - main.tf
      - ...
  - main.tf
  - providers.tf
  - ...

where the ... represents the other files needed within each module (variables.tf and/or outputs.tf, among others).

To keep myself from getting too confused as my configuration grew, I added a comment at the top of every Terraform file, such as loadbalancing/main.tf, with the path and file name.

The VPC and IAM modules were straightforward and didn't require many inputs or variables. Things got more interesting as I started setting up my load balancer and ECS resources. These modules needed certain pieces of information from other modules - for example, the load balancer has to know about the VPC subnets, and the ECS task definition looks for the ARN of its IAM task and execution role(s).

Setting an output for subnet IDs in the VPC module's outputs.tf file:

output "lb_subnets" {
  value = [for subnet in aws_subnet.skyboy_public_subnet : subnet.id]
}

allows the list of subnet IDs to be passed to the Containers module in the root main.tf file:

module "containers" {
  service_subnets = module.vpc.lb_subnets
}

which then gets passed to an ECS service within the Containers module in the main.tf file:

resource "aws_ecs_service" "skyboy_service" {
  network_configuration {
    subnets = var.service_subnets
  }
}

with the additional requirement that var.service_subnets is defined within the variables.tf file in the Containers module as well. It can get a little tricky to keep track of what's been defined in which files; thankfully, my IDE of choice for this project (PyCharm) has a great Terraform plugin that detects the presence or absence of variable definitions between files and modules, which helped to keep things straight.

Deployment Considerations

As I was preparing to deploy my project, I created an AWS organization that oversees a development account and a production account. That meant I would need to figure out how to deploy the Terraform configuration to the appropriate account so that, once I had infrastructure spun up in production, I could spin up a new stack to test changes and not worry about any conflicts that might take the application down.

Problems to solve included:

storing dev and prod state files in separate locations;
using the correct AWS account credentials;
having a way to easily tear down provisioned infrastructure;
passing the correct Docker image URI to Terraform;
and creating the correct load balancer listeners.

(The dev account does not use a Route 53 Hosted Zone with a registered domain for DNS routing to the load balancer, so that account only needs a listener on port 80; making an HTTP request to the load balancer endpoint is sufficient to ensure the infrastructure is set up correctly. The prod account, on the other hand, needs two listeners: one to redirect HTTP traffic on port 80 to HTTPS on port 443, and another to forward HTTPS traffic to the load balancer target group. Requests to the application's domain can verify that the domain's certificate is valid and then trigger the application to launch.)

The final consideration was that I wanted to do all of this with as little code repetition as possible.

Since I had already set up a reusable GitHub Actions workflow for building and pushing the application image, I chose to stay consistent and do the same for Terraform.

Branch-based Actions

I created three YAML files in the repository's .github/workflows directory:

apply_terraform.yml
dev_apply_tf.yml
main_apply_tf.yml

The first file, apply_terraform.yml, is the reusable workflow. In the on: section, which defines the workflow's trigger(s), instead of a git action (push, pull_request, etc.), I used workflow_call, which indicates that this workflow can be called by another workflow. Within workflow_call, I defined inputs and secrets that would be passed into this workflow at calling time.

The jobs: section looks like any other GitHub Actions workflow, with one exception: where repository secrets might otherwise be called, the code instead references the secrets that are passed in via the workflow_call. At one point, this led to several minutes of frustration as I attempted to pass a Terraform Cloud token directly into the reusable workflow but kept getting errors and aborted workflow runs. The solution, oddly, was to call the repository secret in the branch-based workflow and pass it into the reusable workflow.

The two branch-based workflows are identical in structure and are both quite short (as workflows go):

name: Call apply-terrafom from dev branch

on:
  push:
    branches:
      - dev
    paths:
      - 'terraform/**'

jobs:
  apply-tf:
    uses: ./.github/workflows/apply_terraform.yml
    with:
      workspace_name: 'skyboy-dev'
      listeners: 'devlisteners'
    secrets:
      image_uri: ${{ secrets.DEV_IMAGE_URI }}
      tf_token: ${{ secrets.TERRAFORM_TOKEN }}

The workflow is triggered by a push, in this case to the dev branch - and also, only if changes are made within the terraform/ directory of the repository; I don't want changes to the application itself, which is in the same repository, to trigger Terraform runs. (See the Wrap-up for more thoughts on this.)

The single workflow job uses the reusable workflow, and passes in certain inputs and repository secrets defined in the GitHub web console.

Solving Those Problems

So, how does all that help me solve my multi-account deployment problems?

After checking out the repository's code, the next step in the reusable workflow is to run a short bash script:

- name: Run tf_files script
  env:
   WORKSPACE: ${{ inputs.workspace_name }}
   IMAGE: ${{ secrets.image_uri }}
   LISTENERS: ${{ inputs.listeners }}
  run: ../.github/scripts/tf_files.sh

The script can easily access the environment variables that this step sets up, and it performs some basic file manipulations using templates:

sets the Terraform Cloud workspace name in backends.tf, so that the dev and main branch states are stored separately. Every terraform apply happens remotely on Terraform Cloud, giving me the opportunity to store credentials within each workspace, and to tear down the deployed infrastructure easily;
inserts the correct Docker image URI from the correct Elastic Container Repository into a task definition;
and appends the correct listener(s) to the load balancing module's `main.tf' file.

After the script has completed on the GitHub runner, the workflow logs in to Terraform Cloud, runs terraform init, fmt, and validate, and finally apply.

Wrap-up

Addressing the path-based push trigger: why not separate the application and infrastructure into separate repositories? Answer: because that would be too easy! I recognize that having the two in the same repository might not be a best practice, and if the application grows, I may separate them out. The current setup does allow me to keep the entire project in one IDE window, and makes it easier for anyone interested to see all the work that has gone into launching the Skyboy app.

I'm pleased that I was able to set up a modular Terraform configuration for my app. Despite seeming simple in retrospect, adding a script and performing file manipulations during the GitHub Action workflow was another good complexity challenge to overcome, and will still be applicable if I break the infrastructure off into a separate repository.

This write-up is only intended to convey my thought process and an outline of my solutions, and doesn't present enough detail to function as a guided tutorial. Feel free to get in touch if you're attempting something similar and would like clarification about anything I did. I'll do my best to help!

The Ops Community ⚙️