POV: Designing Infrastructure Deployments with GitHub Actions

It started simple — I had some Terraform code to set up my Google Cloud project. A few buckets, a Cloud SQL instance, maybe a Kubernetes cluster.

Running it locally was easy at first. Just terraform plan and terraform apply. Until it wasn't.

It became harder when I needed to test before production. I had to manage multiple environments (dev, staging, prod). Sometimes I'd forget to switch credentials and almost deploy to the wrong project. Teammates started doing the same, and everyone's setup was slightly different. We were spending more time preparing the environment than deploying infrastructure.

And to make it worse, I had to run Terraform plan manually every so often, just to check if someone changed something in the console and the state drifted.

That's when I realized — it's not just about running Terraform. It's about making the process reproducible and safe.

At some point I got tired of re-configuring my local environment.

Every time I wanted to deploy, I had to check:

Terraform version
Google Cloud SDK version
Proper authentication

That's when the idea came — what if the deployment environment itself was versioned and shared?

So I built a Docker image that could do everything: run Terraform and gcloud commands — because not everything in Google Cloud is available in Terraform. For example, enabling APIs or configuring Pub/Sub push endpoints often still needs gcloud.

The image became a reliable, reproducible environment that anyone on the team could use. You could clone the repo, run a single command, and deploy without worrying about installing or configuring anything manually.

At first, it was built just for local use — our "deployment tool in a box." But then I realized: if it works locally, why not use it in CI?

That's when it evolved into something more — a universal deploy environment, shared between local and automated runs.

Example of a deployment command:

docker run --rm \
  -v $(pwd)/cd:/workspace/cd \
  -v ~/.secrets/gcp-service-account.json:/workspace/creds/gcp-service-account.json \
  -e GCP_CREDS_FILE="/workspace/creds/gcp-service-account.json" \
  -e ACTION="plan" \
  usabilitydynamics/udx-worker-tooling:latest

It also needed to handle authentication cleanly.

At first, we all used service account keys — it's the simplest way to get Terraform or gcloud talking to Google Cloud. You download a JSON key, mount it into the container, and the Docker image picks it up through an environment variable. It works, but it's not great for the long term.

The problem is, those keys don't expire. They sit on laptops, in environment variables, sometimes even in repos. And after a few months, nobody remembers which key belongs to what. Keys don't have descriptions, so it's hard to tell if one is still in use or safe to delete.

So I decided to switch to short-lived tokens for GitHub Actions deployments. They act just like credentials, but expire automatically.

In CI, GitHub Actions can request one dynamically using Workload Identity. That identity is linked to the same service account we use locally — so permissions stay consistent across both workflows.

Example of authorization step with Workload Identity

- name: Authorize Google 
  uses: google-github-actions/auth@v3
  with:
    workload_identity_provider: ${{ env.DEPLOY_AUTH_PROVIDER }}
    service_account: ${{ env.DEPLOY_CLOUD_ACCOUNT }}

Output

Run google-github-actions/auth@v3
Created credentials file at "/home/runner/work/aws-cache-invalidation/aws-cache-invalidation/gha-creds-93969fc5c6d1c48e.json"

The best part: once you move CI to Workload Identity, you can safely delete all static keys from the service account. Even if your security team rotates or removes every key, CI deployments keep running — because they don't rely on keys at all.

Locally, we still use service account keys when needed, but they're temporary and can be recreated anytime. During security reviews, we can confidently remove all keys, knowing that production deployments stay safe and keyless.

That's what I wanted from the start: same service account, same permissions, no leftover secrets.

Once the container and authentication were stable, I started designing the workflow itself.

Most configuration now lives inside GitHub environments, using a combination of shared variables and secrets.

Shared values (like regions, image names) come from repository-level vars/secrets.
Environment-specific ones (like project IDs or service accounts) are defined per GitHub environment.
Some values are dynamic — using prefixes or suffixes based on the branch or environment name.

It's clean, auditable, and flexible. You can see exactly what each environment runs with, without digging through YAML.

I didn't want GitHub Actions to just "run Terraform." I wanted it to act like a smart operator — a system that knows when to plan, when to apply, and when to stop.

So I started with a config job. It doesn't deploy anything — it just understands the situation.

It checks:

Which branch triggered the run
Which environment that branch maps to
Whether this is a PR (plan) or a push to main (apply)
What credentials and variables are available
If any configuration files are missing

If something is missing, it tries to generate defaults — and if that's not possible, it fails early with a clear explanation. No more guessing why Terraform broke halfway through.

Configuration Summary Output:

📋 Infrastructure Deployment Configuration Summary
──────────────────────────────────────────────
🏷️ Version: 0.9.12 (source: GitVersion)
📦 Image: udx-worker-tooling:latest
🗂️ Terraform Path: ./cd/terraform
🔐 Auth Mode: Workload Identity (short-lived token)
🌿 Branch: main
🌍 Target Environment: production
🚀 Trigger: push

Execution Plan
──────────────────────────────────────────────
🧩 Config Job: ✅ Completed
📄 Terraform Plan: ✅ Will run
⚙️ Terraform Apply: ✅ Will run (auto-approved)
📬 Slack Notifications: ✅ Enabled
🔑 Service Account Key: ⛔ Not used (Workload Identity)
☁️ Cloud APIs Check: ✅ Enabled

Summary
──────────────────────────────────────────────
🔒 Security Upload: ✅ Enabled
🧠 Environment Detected: production (matched)
👤 Triggered by: github-actions[bot]
──────────────────────────────────────────────

The config job exports outputs that define what happens next. That's the key — later jobs don't have to decide anything; they just follow the plan.

Once config finishes, the workflow moves to the terraform job. That's where the deploy image does the heavy lifting.

This job:

Authenticates to Google Cloud
Verifies the backend bucket for Terraform state
Runs both Terraform and gcloud where needed
Executes terraform init, plan, and — if allowed — apply

Everything runs inside the same Docker image. If it works locally, it'll work in CI exactly the same way.

Example of terraform job output:

──────────────────────────────────────────────
▶ Preparing environment...
📋 Plan-only mode enabled
✅ Terraform initialized successfully
🌍 Project: client-udx
📦 Environment: staging
📁 Config loaded: /workspace/cd/configs/worker.yaml
🔑 Authenticated via service account key
🏗️ Running terraform plan...
──────────────────────────────────────────────
Plan: 0 to add, 2 to change, 0 to destroy
✔️ Plan complete, no errors detected
──────────────────────────────────────────────

Logging and communication became the next focus. Terraform and gcloud outputs can be noisy, so I added structured checkpoints and summaries.

Example: Log Output (Plan-Only)

▶ Determine Environment  
🌿 Branch: feature/update-storage → main  
✅ Environment detected: staging  
⚙️ Mode: plan-only  

▶ Authenticate to Google Cloud  
✅ Authenticated via Workload Identity  
🌍 Project: client-udx  

▶ Terraform Plan  
✅ Initialized successfully  
🔧 Plan complete — Add: 0 | Change: 2 | Destroy: 0  

▶ Notify  
✅ Slack message sent (plan success - staging)

Readable logs aren't decoration — they build trust. When something fails, I want to know what step failed, why, and what it means — not just "Exit 1."

When the deployment finishes, the workflow communicates.

Inside GitHub, I use annotations — short summaries like:

Environment detected
Plan-only mode
Configuration merged
Plan results

Example: Workflow Annotations

ℹ️ infrastructure / terraform  
Plan-only mode enabled — infrastructure will NOT be modified  

ℹ️ infrastructure / terraform  
Environment-specific config file found for environment: production  

ℹ️ infrastructure / terraform  
Successfully merged 3 files into `.tmp/merged-production-infra.yaml`   

ℹ️ infrastructure / terraform  
Found environment-specific files for 'production'  

ℹ️ infrastructure / terraform  
Environment files:  
  ./infra/configs/production/gcp-storage.yaml  
  ./infra/configs/production/gcp-monitoring.yaml  
  ./infra/configs/production/sql-instance.yaml  

ℹ️ infrastructure / terraform  
Using 3 files from environment directory `./infra/configs/production` for 'production'

Outside GitHub, the workflow sends Slack notifications — and I standardized them to make every message clear and predictable.

Success Example:

✅ Infrastructure Deployment Succeeded

Project: client-udx Environment: production Changes: Add: 0 | Change: 2 | Destroy: 0 Status: Success View Workflow Logs

Failure Example:

❌ Infrastructure Deployment Failed

Project: client-udx Environment: production Changes: Add: 0 | Change: 2 | Destroy: 0 Status: Failed Reason: Missing configuration files or invalid variables View Workflow Logs

Same structure, same context — only the outcome changes. That consistency makes notifications useful, not noisy.

Over time, I added more structure and safety.

For example, control gates — rules that decide when jobs can run. Production deployments can't be triggered manually; they only run when a PR is reviewed, approved, and merged.

That way, I can still plan from a feature branch, but only the reviewed code can apply changes to production. It's not just automation — it's governance built in.

Another improvement was reusability. When you write enough workflows, you start repeating the same patterns.

So I began building composite actions — step templates that bundle complex logic into a single, reusable block. For example, one handles Slack notifications, another prepares Terraform environments, and another manages Google Cloud authentication.

Then I moved to workflow templates — complete workflows that define the standard pipeline structure. Every Terraform repo can inherit the same pattern: config → plan → apply → notify. You just drop one YAML file in a repo and it works.

The key idea was simplicity — making complex delivery logic easy to reuse, understand, and maintain. Hard stuff, wrapped in a light interface.

Eventually, the workflow became more than automation — it became orchestration.

When I push code now, I see a complete story:

Config detects the environment and validates inputs.
Terraform and gcloud run in a clean, shared environment.
Control gates decide what's allowed.
Notifications summarize the result.

It's not just code execution — it's controlled delivery. No manual setup, no hidden state, no confusion.

And it all started because I got tired of switching credentials and Terraform versions locally. That small decision — to package the deploy logic into a single reproducible container — ended up defining how my team delivers infrastructure today.

I don't just automate Terraform anymore. I design workflows that make infrastructure delivery predictable, explainable, and reusable.