On Building Lambda Payloads with Terraform || Eiara Limited

Previously we announced our packaging tool for AWS Lambda. I’d like to dive into the workings of this tool much more deeply.

Working with AWS Lambda and Terraform is a big part of what I do on a day-to-day basis, and both tools are vital parts of shift to infrastructure-as-code and complex management of our deployments.

My background as a software developer is predominately in Python which has meant that AWS Lambda is often perfect for me and ensuring my write-deploy-test loop being very short.

But,

well

deploying more complicated programs into the Lambda environment is … not as easy as I’d like it to be, unfortunately. The documentation on building a .zip payload for the Lambda function is clear, but requires a fairly high amount of knowledge of how to set up and develop with Python.

While this is true for me this is not something I can take for granted when I’m working with clients on their infracode.

As a result of needing reliable and consistent tooling for our clients and ourselves, I created a module to build those AWS Lambda Python payloads, as part of a Terraform deployment.

This has been great at reducing the friction of the write-deploy-test loop and making it easy to ship Lambda code to others by ensuring they don’t need advanced knowledge in order to build or deploy the code. It just works, and this has been a really great piece of technology as a result.

I’d like to go over how this module works, why it’s doing what it’s doing, and what benefits this is really bringing to the table.

Core Assumptions and “The Why”

But before we touch on how it works, it’s important to dive further into why.

There’s existing toolchains that solve this problem - serverless.io jumps immediately to mind. So, why reimplement the world in Terraform?

Well, as amazing as Terraform is and as much as we have chosen Terraform as the Right Tool for the job of infracode management, it doesn’t always play well with others. You can import resources created by other tools into the Terraform state, but this doesn’t make it easy to manage, work with, or observe what’s going on.

At the time of writing, it’s more of an afterthought than a core capability of the tool.

The result of this is that while it’s entirely possible to run multiple infracode tools in the same environment, that lack of observability and the potential of tools stomping on each others’ toes makes deployment potentially more variable than would be comfortable.

Additionally, the fewer the tools in use in a given environment the broader the mastery of those tools, achieved by more team members. This is most relevant on smaller teams, but needing to learn Terraform and Serverless and Docker and several other pieces of toolchain software can quickly become overwhelming and lead to a loss of buy-in in the new processes.

For a DevOps process, especially a new one, this is a complete failure.

By directly integrating into Terraform, we reduce tooling overhead and ensuring that “it just works” is the expectation and the norm.

Let’s Talk about How

Terraform is incredibly powerful, but it is first and foremost a declarative programming tool, where most tools that we work with on a day-to-day basis are imperative.

Declarative tools let the user say what, and imperative tools let the user say how to do what they want.

In practise, what this means is that you tell Python how to create a VPS, whereas you tell Terraform that you want a VPS to exist.

As a declarative tool, Terraform is extremely resistant to programming imperative “how to”-style modules in the language. It really wants to focus on “what”, not “how”.

Fortunately we’re experts and can subvert this expectation and desire. 😈

The Simple Path

The fastest way to integrate the build stage into Terraform is just to build the Lambda deployment package every single time we run terraform apply. This would be a simple null_resource declaration in Terraform, something akin to:

resource "null_resource" "build_stage" {
    provisioner "local-exec" {
        command = "/bin/bash my_build_script.sh"
    }
}

This has numerous expectations built in, such as it creating a .zip archive in a known place that we can reference for uploading to S3 and creating the Lambda. It works, and it works reliably.

But it rebuilds the zip file every single time we re-run Terraform, which leads to extremely messy plan outputs and a difficulty trying to see the changes we need to focus on. It can create a versioning mess at the AWS Lambda level as well, with each re-apply creating a new version of the function.

During development of new infrastructure this can introduce subtle errors, and it becomes difficult to used a versioned S3 bucket in any meaningful way for the archive.

So while this works, I think we can do better.

The Complicated Path

The more complicated version grew out of wanting to provide those my_build_script.sh functions for our clients.

This started as a simple zip.sh which made assumptions about the Python virtualenv. Because in Python the default is always to install new packages globally, requiring a virtualenv be created to ensure only the packages we need get installed. As a result, zip.sh assumed there would be a virtualenv and it would always be in the same place, but would fail with odd messages if it wasn’t present.

This wasn’t scalable or robust.

It needed to be added and painstakingly set up every time we added a new Lambda, or a new developer. Onboarding was a pain.

It broke. It broke frequently.

This isn’t great by any stretch, so we needed to improve it.

Because of how Python’s virtualenv system works we need a fully isolated environment to build and package the Python function, and setting up a new virtualenv by hand over and over again?

That should be hidden away.

Because we need access to several Python interpreters at once, because AWS Lambda supports three separate versions, 2.7, 3.6 and 3.7.

This means we need a tool like pyenv to manage the installation and creation of virtualenv environments.

And because clean Terraform says we should do nothing if nothing has changed, instead of building the payload every time we should test if we need to build the payload.

This means we need to detect when changes have happened, and that’s where it turns out that making this robust, isolated and reliable is a really complicated proposition.

How it Works

We ended up taking the complicated path, and after a lot of testing and bugfixing we got it working reliably. It builds payloads, uploads them, and doesn’t rebuild them if it doesn’t need to. It’s clever, fast, and reliable.

Terraform offers the tools that make this possible, those being the null resource with the local provisioners, and the external data source.

First, we use an external data source to gather a couple of facts about our build: - Is there an existing archive? - If there is an existing archive, does it sha256 to what we think the archive should be? - Has our code repository changed?

All of these checks are done by calling a shell script and running some simple tests.

Critically to make this work we needed a stable identifier for the payload .zip itself, which we optionally generate by using a current timestamp in the filename.

If any of the above conditions are false the second stage kicks in, which:

Tests if the appropriate Python version is present in pyenv,
Creates the virtualenv
Installs all the package requirements in the virtualenv,
Compiles the Python code to .pyc,
Zips everything up, and
Places the zip file in the output directory.

Because each stage is managed with its own null_resource we can create full dependency ordering in Terraform, as well as the stable identifiers we need to ensure a clean apply.

Overall this has been extremely robust and reliable, to the point where I recommend it for all our clients that are using Terraform and doing AWS Lambda development, in Python.

Downsides to the Approach

But no tool is without its downsides, and there are definitely downsides to this approach.

As it stands, the tool requires pyenv and pyenv-virtualenv to be not just installed, but installed correctly and activated in the users’ environment. This has been a point of frustration and has required extra setup knowledge.

Shifting between developer machines does cause the package to rebuild, when it shouldn’t. This is problematic because it can easily create a scenario where two developers will just end up with clashing updates over and over again.

By using a considerable amount of shell script it’s harder to debug than would otherwise be ideal, and this could be seen as a brittle part of the codebase.

The way the stable identifiers work means it’s currently hard to get the package working properly in a continuous integration or continuous development environment.

Where To From Here

The above issues are open on GitHub with some proposed ideas, and I’m looking forward to making this toolkit even more robust in the coming weeks.

I’m really quite happy with how it’s turned out. It makes developing Lambdas, even with some setup hassle, so much easier than they were before, while maintaining the ease-of-use that Terraform brings to the table.

The major next step is that we’re not done with just Python, and the next language on our list is nodejs. There’s an experimental branch that our clients are testing that seems to be stable, so we should be merging that in soon.

Overall this has been a great project, and I’m really happy with how it’s worked out. I think it’s a great contribution to the broader ecosystem around developing toolchains in Terraform, and building better infracode.

Do you have more questions? Want to sit down for a full code review with me? Get in touch at aurynn@eiara.nz!

terraform devops infracode

Back to Articles