Improving coverage of cloud resources to reduce infrastructure drift

Written by:
Stephane Jourdan
Stephane Jourdan
wordpress-sync/feature-iac-drift-purple

March 23, 2022

0 mins read

Deprecation notice: Drift detection of managed resources

Drift detection of managed resources, including snyk iac describe --only-managed and snyk iac describe --drift has been deprecated. The end-of-life date for drift detection of managed resources is September 30, 2023.

As developers, we need maximum visibility of what’s actually running in our cloud environments, in order to keep them secure. Infrastructure as code (IaC) helps developers automate their cloud infrastructures, so what’s deployed to the cloud is under control and can easily be audited. But achieving and maintaining 100% IaC coverage of your infrastructure has many challenges.

We’re only as secure as what is actually deployed and running in our cloud environments, and more often than not, a lot of manual actions are still done on a regular basis, by us, other teams, or some authenticated services. Those changes are hidden from IaC and auditing, bringing issues like misconfigurations and security concerns. That’s when drift management becomes important: we want reports of resources that are not yet under IaC control, or that have changed for some reason.

In this article, we will show how Snyk IaC helps developers to discover cloud resources that are not under infrastructure as code (IaC) control (unmanaged resources), or that have drifted from their expected state (managed resources).

Set up the environment

Snyk IaC helps you list the resources it finds as Terraform resources, so you can easily know which part of the cloud service is targeted by the discovery. For example, a single Amazon API Gateway v2 service is made of at least 12 Terraform resources. With discovery information provided by Snyk, you'll be able to promptly decide whether to revert the modification, import a new resource, or simply delete that new change.

To follow along, you can use the Terraform file below to create two AWS resources that we will use in walkthrough. It creates an IAM user named "user1" with a random suffix, an access key, and an attached policy for read-only access.

At the time of this writing, we used Terraform v1.1.7 with the AWS provider v3.74.2

Reuse the following HCL configuration:

main.tf

1resource "random_string" "prefix" {
2  length  = 6
3  upper   = false
4  special = false
5}
6
7resource "aws_iam_user" "user1" {
8  name = "user1-${random_string.prefix.result}"
9
10  tags = {
11    Name = "user1-${random_string.prefix.result}"
12    manual = "true"
13  }
14}
15
16resource "aws_iam_access_key" "user1" {
17  user = aws_iam_user.user1.name
18}
19
20resource "aws_iam_user_policy_attachment" "user1" {
21  user       = aws_iam_user.user1.name
22  policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
23}

Apply that Terraform configuration:

1$ terraform init
2[...]
3$ terraform apply
4[...]

Confirm you have a terraform.tfstate at the root of the directory:

1$ ls -al terraform.tfstate
2-rw-r--r--  1 sjourdan  staff  5049 Mar 16 18:31 terraform.tfstate

Also confirm that the IAM user is successfully created on AWS.

Getting started with a clean slate

Let's start by listing all the cloud resources that are not under Terraform control:

1$ snyk iac describe --only-unmanaged

You'll likely end up with a huge list of resources that are not under Terraform control. It's already great information, but not very actionable for our case. Snyk IaC has a built-in way to ignore resources in bulk, by adding all found resources to the .snyk policy file.

Let's ignore all those existing unmanaged resources, so we can work more precisely in a controlled environment with just the two resources we created above:

1$ snyk iac describe --only-unmanaged --json  | snyk iac update-exclude-policy

Scan again to confirm that your environment is now actively ignoring the discovered drifts (you'll have all the time later to schedule importing them). 

1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found 3 resource(s)
5 - 100% coverage
6Congrats! Your infrastructure is fully in sync.

We're now ready to start from a clean state.

Let's drift with IAM!

We will now create three types of drift to simulate real-life situations:

  1. A modification on the existing IAM user (that we will want to revert)

  2. A manual attachment of a new IAM policy (that we will want to remove)

  3. A new IAM user (that we will want to improve)

To do so, navigate to the AWS Console for IAM.

Modify the existing IAM user by adding a tag

  1. On the IAM users page, click on "user1"

  2. Click on the Tags tab

  3. Click on the Edit Tags button

  4. Add a new key ("environment") and a new value ("production")

  5. Click Save

Attach a powerful policy to the existing IAM user

  1. On the IAM users page, click on "user1"

  2. Click on the Permissions tab

  3. Click on the Add permissions button

  4. Click on Attach existing policies directly

  5. Select Administrator Access

  6. Click Next: Review

  7. Validate by clicking on Add permissions

Create another IAM user manually

  1. On the IAM users page, click on the Add Users  button

  2. Enter "user2" in the User name: field

  3. Select Access key

  4. Click on the Next: Permissions button

  5. Don’t set any permissions or tags

  6. Click on Create user (we don't care about the displayed credentials, so you can discard them).

We're now ready to tackle those the types of manual changes using Snyk IaC drift detection.

Managed and unmanaged infrastructure drift

Let's now find out how those changes are detected by Snyk IaC, and start with resources that are simply not managed at all by Terraform.

1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found resources not covered by IaC:
5  aws_iam_access_key:
6    - AKIASBXWQ3AYQETE6OFR
7        User: user2
8  aws_iam_policy_attachment:
9    - user1-84i30k-arn:aws:iam::aws:policy/AdministratorAccess
10  aws_iam_user:
11    - user2
12Found 6 resource(s)
13 - 50% coverage
14 - 3 resource(s) managed by Terraform
15 - 3 resource(s) not managed by Terraform
16 - 0 resource(s) found in a Terraform state but missing on the cloud provider

This scan reported, using Terraform resources terms:

  • The manually created IAM "user2", with its IAM access key

  • The manually attached IAM policy to the Terraform managed "user1" IAM user.

Let's now check for changes only on resources managed by Terraform that are found in the various Terraform states:

1$ snyk iac describe –only-managed
2Scanned states (1)
3Found changed resources:
4  From tfstate://terraform.tfstate
5    - user1-84i30k (aws_iam_user.user1):
6        + tags.environment: <nil> => "production"
7Found 5 resource(s)
8 - 100% coverage
9 - 5 resource(s) managed by Terraform
10     - 1/5 resource(s) out of sync with Terraform state
11 - 0 resource(s) found in a Terraform state but missing on the cloud provider

This scan reported a very different output, and took significantly longer (36s versus 9s for the "unmanaged" scan mode).

Using this output we learn that the IAM user named "user1-84i30k", which we can find in the HCL (as a resource) under the name "user1", has a tag named "environment" set to "production".

Action plan

The Snyk drift detection tool helped us discover four unexpected differences between our expectations and reality. For the sake of this article, let's say the team decides the following:

  • "user2" IAM user is used in production and should be imported in Terraform.

  • "user2" IAM Access Key should be rotated for security reasons.

  • "user1" should under no circumstances be an Administrator.

  • "user1" new tag is needed by some requirement and should be imported in Terraform.

What

Resource type

Name

Drift type

Action

An IAM user

aws_iam_user

user2

Unmanaged

IMPORT

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

ROTATE

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

DELETE

A tag on an IAM user

aws_iam_user

tags.environment

Managed

IMPORT

Deployment pipelines are not remediation

We have a great Terraform deployment pipeline in place, and the next time terraform apply triggered, we may expect things to go back to normal.

In this case, what will Terraform do? A deployment job:

1$ terraform apply
2Terraform will perform the following actions:
3
4  # aws_iam_user.user1 will be updated in-place
5  ~ resource "aws_iam_user" "user1" {
6        id            = "user1-84i30k"
7        name          = "user1-84i30k"
8      ~ tags          = {
9          - "environment" = "production" -> null
10            # (1 unchanged element hidden)
11        }
12[...]
13
14Plan: 0 to add, 1 to change, 0 to destroy.

Terraform was never meant to discover manually created or attached resources, and will simply revert the modified ones to the original state (which is not what we want in this situation).

What

Resource type

Name

Drift type

Action

An IAM user

aws_iam_user

user2

Unmanaged

NONE

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

NONE

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

NONE

A tag on an IAM user

aws_iam_user

tags.environment

Managed

REVERT

In none of the cases is the help that we expect:

  • The manually created IAM user and its Access Key are not reported (unhelpful)

  • The manually attached Administrator policy to a managed user is not reported (unhelpful)

  • The important tag manually added to a managed user will be reverted (harmful)

A different type of tool is needed for this type of detection and work.

Improving our coverage

We start our journey at 50% coverage for the unmanaged resources:

1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found resources not covered by IaC:
5  aws_iam_access_key:
6    - AKIASBXWQ3AYQETE6OFR
7        User: user2
8  aws_iam_policy_attachment:
9    - user1-84i30k-arn:aws:iam::aws:policy/AdministratorAccess
10  aws_iam_user:
11    - user2
12Found 6 resource(s)
13 - 50% coverage
14 - 3 resource(s) managed by Terraform
15 - 3 resource(s) not managed by Terraform
16 - 0 resource(s) found in a Terraform state but missing on the cloud provider

Let's improve this based on the team plan.

Delete The IAM policy for 'user1'

Let's start with the most urgent and easiest: removing the "Administrator" policy for the managed IAM "user1":

  • Go to IAM > Users > "user1"

  • Click on Permissions > delete "AdministratorAccess"

1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found resources not covered by IaC:
4  aws_iam_access_key:
5    - AKIASBXWQ3AYQETE6OFR
6        User: user2
7  aws_iam_user:
8    - user2
9Found 5 resource(s)
10 - 60% coverage
11 - 3 resource(s) managed by Terraform
12 - 2 resource(s) not managed by Terraform

We're now covering 60% of our AWS resources, up from 50%.

What

Resource type

Name

Drift type

Action

Status

An IAM user

aws_iam_user

user2

Unmanaged

IMPORT

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

ROTATE

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

DELETE

*

A tag on an IAM user

aws_iam_user

tags.environment

Managed

ADD

Let's continue.

Unblock the Terraform deployment pipeline

The pipeline is currently blocked by this manual change to the tags for aws_iam_user.user1. If any deployment happens, the tags will be reverted back to what's on the HCL. So what’s the solution? Use the Snyk IaC drift output to adapt our Terraform configuration.

The information we have is the following:

1Found changed resources:
2  From tfstate://terraform.tfstate
3    - user1-84i30k (aws_iam_user.user1):
4        + tags.environment: <nil> => "production"

We know from this output that:

  • We're looking for a resource named aws_iam_user named "user1"

  • That resource is found in terraform.tfstate (very handy when you have dozens or hundreds of states)

  • There's a new tag key named environment with a value of "production".

Let's update our IAM user resource by simply adding environment = "production", so our resource now looks like this:

1resource "aws_iam_user" "user1" {
2 name = "user1-${random_string.prefix.result}"
3
4 tags = {
5   Name = "user1-${random_string.prefix.result}"
6   environment = "production"
7 }
8}

We can now safely unblock our Terraform deployment pipeline:

1$ terraform apply
2No changes. Your infrastructure matches the configuration.
3Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

We have fixed our "managed" drifts for now:

1$ snyk iac describe --only-managed
2Scanned states (1)
3Found 3 resource(s)
4 - 100% coverage
5Congrats! Your infrastructure is fully in sync.

What

Resource type

Name

Drift type

Action

Status

An IAM user

aws_iam_user

user2

Unmanaged

IMPORT

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

ROTATE

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

DELETE

*

A tag on an IAM user

aws_iam_user

tags.environment

Managed

ADD

*

Import and rotate IAM user2

Let's now handle the "user2" case. We want to:

  • Import it into Terraform

  • Rotate the key

Let's start by importing the IAM user into Terraform, and here's one simple way to do it.

Start by collecting the information from Snyk IaC:

Resource type

Name

aws_iam_user

user2

How do we import an aws_iam_user resource? According to Terraform official documentationIAM Users can be imported using thename, e.g., $ terraform import aws_iam_user.lb loadbalancer.

We can also read that the only required argument is the name. So let's add this basic structure to our HCL file:

1resource "aws_iam_user" "user2" {
2 name = "user2" # required
3}

Let's now import this user into Terraform:

1$ terraform import aws_iam_user.user2 user2
2aws_iam_user.user2: Importing from ID "user2"...
3aws_iam_user.user2: Import prepared!
4  Prepared aws_iam_user for import
5aws_iam_user.user2: Refreshing state... [id=user2]
6
7Import successful!

How did our coverage evolve? Let's find out:

1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found resources not covered by IaC:
4  aws_iam_access_key:
5    - AKIASBXWQ3AYQETE6OFR
6        User: user2
7Found 5 resource(s)
8 - 80% coverage
9 - 4 resource(s) managed by Terraform
10 - 1 resource(s) not managed by Terraform

We're now at 80% coverage (from 60%) and only one resource is left.

What

Resource Type

Name

Drift type

Action

Status

An IAM user

aws_iam_user

user2

Unmanaged

IMPORT

*

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

ROTATE

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

DELETE

*

A tag on an IAM user

aws_iam_user

tags.environment

Managed

ADD

*

Rotate the key

Let's tackle this now. We know we want to rotate the key while adding it to Terraform. Let's start by adding the new key to the HCL to create a new key (so we can give it to the relevant team for example) and finally we'll simply delete the old one from AWS.

Terraform documentation for aws_iam_access_key is very straightforward, so we can simply create one resource that takes the user2 name as argument:

1resource "aws_iam_access_key" "user2" {
2 user = aws_iam_user.user2.name
3}

As the deployment pipeline was previously unblocked, we can safely apply this using Terraform to create a new key:

1$ terraform apply 
2[...]
3Terraform will perform the following actions:
4
5  # aws_iam_access_key.user2 will be created
6  + resource "aws_iam_access_key" "user2" {
7      + create_date          = (known after apply)
8      + encrypted_secret     = (known after apply)
9      + id                   = (known after apply)
10      + key_fingerprint      = (known after apply)
11      + secret               = (sensitive value)
12      + ses_smtp_password_v4 = (sensitive value)
13      + status               = "Active"
14      + user                 = "user2"
15    }
16
17Plan: 1 to add, 0 to change, 0 to destroy.
18
19aws_iam_access_key.user2: Creating...
20aws_iam_access_key.user2: Creation complete after 1s [id=AKIASBXWQ3AY4KPUNIHZ]
21
22Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

We still have the old key to remove. Using the information from the Snyk IaC output, we know that the key name is AKIASBXWQ3AYQETE6OFR.

The simplest way to remove this key is to:

  • Go to IAM > Users > user2 > Security Credentials

  • Remove the key named AKIASBXWQ3AYQETE6OFR as reported by Snyk IaC by deactivating it, then deleting it.

How does our coverage look like now?

1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found 5 resource(s)
4 - 100% coverage
5Congrats! Your infrastructure is fully in sync.

Congratulations! Everything is back under control, thanks to Snyk IaC drift detection!

What

Resource type

Name

Drift type

Action

Status

An IAM user

aws_iam_user

user2

Unmanaged

IMPORT

*

An IAM access key

aws_iam_access_key

AKIASBXWQ3AYQETE6OFR

Unmanaged

ROTATE

*

An attached IAM policy

aws_iam_policy_attachment

arn:aws:iam::aws:policy/AdministratorAccess

Unmanaged

DELETE

*

A tag on an IAM user

aws_iam_user

tags.environment

Managed

ADD

*

Wrapping Up

In this article, we showed how Snyk IaC drift detection can help discover manually created AWS resources, how it reports everything in Terraform terms with the right information to help developers import those resources into their Terraform HCL code. We also briefly discovered that automatically reverting changes might not always be the desired outcome and that a lightweight drift detection alerting system is needed in conjunction with that deployment pipeline.

We firmly believe all infrastructure should be in code, so engineers can have security feedback and visibility into issues as soon as possible.

That’s why Snyk IaC can help teams quickly reintegrate all the resources actually running in their AWS account into Terraform code to increase overall IaC coverage and reduce security issues overall. Snyk IaC drives faster fixes by closing the feedback loop between cloud security and engineering teams and reporting actionable fixes direct to engineer, in engineer-friendly terms.

Patch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo Segment

Snyk is a developer security platform. Integrating directly into development tools, workflows, and automation pipelines, Snyk makes it easy for teams to find, prioritize, and fix security vulnerabilities in code, dependencies, containers, and infrastructure as code. Supported by industry-leading application and security intelligence, Snyk puts security expertise in any developer’s toolkit.

Start freeBook a live demo