Using Terraform 0.13+ with AWS wildcard certificates

Note: this is a technical article relating to the gory details of how Kopi infrastructure is managed. If it doesn't look interesting to you - skip it.

Why create such complicated certificates?

The Kopi website and app are hosted on AWS CloudFront  and the HTTPS encryption certificate managed via AWS Certificate Manager .

CloudFront only supports one ACM certificate per distribution, as per the documentation . To use an HTTPS/SSL protected CloudFront distribution with multiple subdomains, you need a cert that is valid for both the root domain (kopi.cloud) and the wildcard subdomains (*.kopi.cloud).

For yet more complexity, Kopi runs on multiple domains. I don't want to run a CloudFront distribution per domain - so the certificate needs to be valid for many domains.

Upgrading Kopi to Terraform 0.13+

Kopi uses an "Infrastructure as Code" approach to configuring the AWS resources necessary to run the service. My tool of choice for this is Terraform .

Recently I decided to upgrade Kopi to the latest 0.14 release of Terraform and I had some troubles with certificate resources, so I thought I might document the solution here (yes, it's content marketing).

Watch out for the zig-zag upgrade path!

Usually, I upgrade Terraform by first upgrading the AWS provider, then upgrading the Terraform verion. Always bumping one major version at a time, doing a full plan/apply cycle between each step.

Previously, Kopi used version 0.12 of Terraform and the 2.x AWS provider. I skipped 0.13 because every Terraform upgrade seems to cause problems. So if I don't need the features, I batch up Terraform upgrades until I have sufficient time and motivation to deal with the inevitable issues.

The 0.13 upgrade caused an issue because Hashicorp made a breaking change to the aws_acm_certificate - changing the type of the domain_validation_options field From a List to a Set. The documented solution is to use the for_each functionality, which is only available with Terraform 0.13.

So if you use aws_acm_certificate resources, be aware that you'll probably need to upgrade to at least TF 0.13 before you can upgrade the AWS provider to 3.x.

The old 0.12 Terraform code

Note the hardcode indexes in the aws_route53_record resources (lines 23-25 and 32-34). I'm not a big fan of Terraform's for_each  syntax, but at least it's better than hard-coding references like this.

locals {
  kopi_cloud_dns_name = "kopi.cloud"
  kopimail_net_name   = "kopimail.net"
}

resource "aws_acm_certificate" "website-prd-cloudfront-acm-certificate-v2" {
  tags = {
    Name = "website-prd-cloudfront-acm-certificate-v2"
  }
  domain_name = "*.${local.kopi_cloud_dns_name}"
  subject_alternative_names = [
    "*.${local.kopimail_net_name}",
    local.kopi_cloud_dns_name,
    local.kopimail_net_name,
  ]
  validation_method = "DNS"
}

resource "aws_route53_record" "kopi-cloud-root-acm-validation-record-v2" {
  zone_id = data.aws_route53_zone.kopi-cloud-root-route53-zone.id
  ttl     = 60
  
  name    = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[2].resource_record_name
  type    = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[2].resource_record_type
  records = [aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[2].resource_record_value]
}

resource "aws_route53_record" "kopi-mail-root-acm-validation_record-v2" {
  zone_id = data.aws_route53_zone.kopimail-net-root-route53-zone.id
  ttl     = 60

  name    = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[3].resource_record_name
  type    = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[3].resource_record_type
  records = [aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.domain_validation_options[3].resource_record_value]
}

resource "aws_acm_certificate_validation" "website-prd-cloudfront-acm-certificate-validation-v2" {
  certificate_arn = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v2.arn
  validation_record_fqdns = [
    aws_route53_record.kopi-cloud-root-acm-validation-record-v2.fqdn,
    aws_route53_record.kopi-mail-root-acm-validation_record-v2.fqdn,
  ]
}

The new 0.14 Terraform code

Note the if conditions in the aws_route53_record resources that ensure TF only tries to create a DNS record for the root domain (e.g. line 59).

The conditional filters out the entry for the wildcard domain; ACM will see both records as "validated" because it generates identical validation options (see the comments for more details).

The conditional also filters out records for the "other" domains - no point putting a kopimail.net record in the kopi.cloud route-53 zone.

/* DO NOT UPDATE certificate objects.
When you need to create a new cert (or especially cert_validation) - you're
better off making a new certificate with a new "Vx" version.  This way you can
be sure your new cert is actually working, *before* changing over all the things
that rely on the cert.

Remember that certs get issued and don't expire for a year, so the cert will
be valid for a while.  I'm not sure if they get revoked if the DNS goes away - 
but it won't matter if you don't rush, have parallel validated certs and
only change over usage of the certs when things are stable.  Obviously, leave
the old cert to sit for a while before deleting it, just to be safe.

I've had especially lots of problems around the DNS validation records,
and given how long creating DNS takes sometimes, and that it will fail on apply
if you have duplicate record entries (it won't detect that at plan time) and the
possibility of needing to wait for DNS propagation - it's best to do cert stuff
in the simplest possible steps that are easy to verify.  Add a calendar entry
to delete the cert though - don't want to get surprised by an unexpected
dependency that you forgot to change and the cert expires.
*/

/* NOTE: you should refer the aws_acm_certificate_validation not to the
certificate resource directly.
That allows TF to enforce waiting for the cert to be valid (status = "ISSUED")
before creating the resource.  Probably not important for a CloudFront
distribution but might be more important for stuff ELBs, etc?
*/
resource "aws_acm_certificate" "website-prd-cloudfront-acm-certificate-v3" {
  tags = {
    Name = "website-prd-cloudfront-acm-certificate-v3"
  }
  domain_name = "*.${local.kopi_cloud_dns_name}"
  /* I believe both root (`k.c`) and wildcard (`*.k.c`) entries  are necessary
  for SSL to work properly, but it causes problems because AWS will generate a
  "domain validation option" for each of them, but the route53 records it
  requires both have the same CNAME and value.
  That causes the Terraform route53 object to freak out when it tries to create
  two records with the same name. */
  subject_alternative_names = [
    "*.${local.kopimail_net_name}",
    local.kopi_cloud_dns_name,
    local.kopimail_net_name,
  ]
  validation_method = "DNS"
}

resource "aws_route53_record" "kopi-cloud-root-acm-validation-record-v3" {
  zone_id = data.aws_route53_zone.kopi-cloud-root-route53-zone.id
  ttl     = 60

  for_each = {
    for dvo in aws_acm_certificate.website-prd-cloudfront-acm-certificate-v3.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
    /* doesn't really matter if it's `k.c` or `*.k.c` - coz they're identical
    see the comment on the aws_acm_certificate SANs for more info. */
    if dvo.domain_name == local.kopi_cloud_dns_name
  }

  name    = each.value.name
  type    = each.value.type
  records = [each.value.record]
}

resource "aws_route53_record" "kopimail-net-root-acm-validation-record-v3" {
  zone_id = data.aws_route53_zone.kopimail-net-root-route53-zone.id
  ttl     = 60

  for_each = {
    for dvo in aws_acm_certificate.website-prd-cloudfront-acm-certificate-v3.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
    if dvo.domain_name == local.kopimail_net_name
  }

  name    = each.value.name
  type    = each.value.type
  records = [each.value.record]
}


/* https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/acm_certificate_validation
"This resource implements a part of the validation workflow.
It does not represent a real-world entity in AWS, therefore changing or deleting
this resource on its own has no immediate effect."

From observation, ACM attempts to validate certs asyncrounsouly - you don't
*need* cert_validations resources to make it work. I saw both the v3 AND
v4 (temporary while I was figuring it out) certificates go to status = ISSUED
when I got the route53 resource correct *before* I created this validation
resource at all.
*/
resource "aws_acm_certificate_validation" "website-prd-cf-acm-cert-validation-v3" {
  certificate_arn = aws_acm_certificate.website-prd-cloudfront-acm-certificate-v3.arn

  validation_record_fqdns = concat(
    [
      for record in aws_route53_record.kopi-cloud-root-acm-validation-record-v3 : record.fqdn
    ],
    [
      for record in aws_route53_record.kopimail-net-root-acm-validation-record-v3 : record.fqdn
    ]
  )
}