Eiara Blog

The last 5 posts to the Eiara Blog

Cross-Account Cloudfront Logs with Terraform

Fri 30 November 2018

AWS offers the fairly excellent Cloudfront service, providing a solid caching proxy in front of your resources. It’s exceptionally good for static resources like CSS or Javascript, and even dynamic content that changes infrequently.

I consider it good design practice to ensure that AWS logs are shipped to a central logging account, providing a central location from which to build out logging infrastructure and tooling, instead of spreading it across multiple accounts in the organisation.

A single place to look for new insights, to try to understand what’s happening in the system? Good design, all around.

AWS Cloudfront supports logging its access requests to S3, like most AWS services. It also supports multiple accounts feeding in to the same S3 bucket, but, it’s not entirely obvious how to do that.

I recently spent some time digging into how to do this from Terraform, and I’d like to share how I solved this problem for multiple simultaneous accounts.

Bucket Policies and IAM

IAM is one of the more complicated but most important aspects of using AWS, and I regularly find myself writing new IAM policies that specifically lock down resource capabilities to ensure any misuse of the attached roles is limited.

However, this runs counter to how AWS Cloudfront distribution logging expects to work.

Instead of writing an S3 bucket policy that allows AWS Cloudfront to write to our target logging bucket, we instead need to grant s3:GetBucketACL and s3:PutBucketACL to each account that we want to be able to write logs.

This gives us a bucket policy that will look something like this:

data "aws_iam_policy_document" "logging_bucket_policy" {
  statement {
    actions = [
      "s3:GetBucketACL",
      "s3:PutBucketACL",
    ]

    resources = [
      "arn:aws:s3:::my-logging-bucket",
    ]

    principals {
      type = "AWS"

      identifiers = [
        "${data.aws_caller_identity.secondary.account_id}",
        "${data.aws_caller_identity.tertiary.account_id}",
      ]
    }
  }
}

Recognising that I needed to let go and allow AWS Cloudfront to manage the bucket ACLs on its own was a major requirement to allowing logs to be written.

S3 Setup

At this point we create our S3 logging bucket:

resource "aws_s3_bucket" "logs" {
  provider = "aws.primary"
  bucket   = "my-logging-bucket"
  acl      = "private"
  policy   = "${data.aws_iam_policy_document.bucket_policy.json}"
}

and we create the S3 buckets to serve our content:

resource "aws_s3_bucket" "server_secondary" {
  provider = "aws.secondary"
  bucket   = "secondary-cloudfront-serve-bucket"

  website {
    index_document = "index.html"
  }
  policy = "${data.aws_iam_policy_document.read_secondary.json}"
}

and

resource "aws_s3_bucket" "server_tertiary" {
  provider = "aws.tertiary"
  bucket   = "tertiary-cloudfront-serve-bucket"

  website {
    index_document = "index.html"
  }
  policy = "${data.aws_iam_policy_document.read_tertiary.json}"
}

But, we haven’t defined the bucket policies for either of those serve buckets yet — let’s do that next.

Bucket Policies and Cloudfront Origins

In order to ensure that access to our S3 bucket only goes through Cloudfront, we want to create Cloudfront Origin policies, that we attach to our buckets.

Origins

resource "aws_cloudfront_origin_access_identity" "s3_access_secondary" {
  provider = "aws.secondary"
  comment = "secondary identity"
}

and

resource "aws_cloudfront_origin_access_identity" "s3_access_tertiary" {
  provider = "aws.tertiary"
  comment = "tertiary identity"
}

Policies

Next, we can define our bucket policies.

These policies will allow the Cloudfront origin to read anything in our server bucket, and list out the buckets. This is enough permission to do everything we’ll need for a static site.

data "aws_iam_policy_document" "read_secondary" {
  # Cloudfront can read anything
  statement {
    actions   = ["s3:GetObject"]
    resources = ["arn:aws:s3:::secondary-cloudfront-serve-bucket/*"]

    principals {
      type        = "AWS"
      identifiers = ["${aws_cloudfront_origin_access_identity.s3_access_secondary.iam_arn}"]
    }
  }

  # Cloudfront can list the bucket
  statement {
    actions   = ["s3:ListBucket"]
    resources = ["arn:aws:s3:::secondary-cloudfront-serve-bucket"]

    principals {
      type        = "AWS"
      identifiers = ["${aws_cloudfront_origin_access_identity.s3_access_secondary.iam_arn}"]
    }
  }
}

and

data "aws_iam_policy_document" "bucket_policy_read_tertiary" {
  # Cloudfront can read anything
  statement {
    actions   = ["s3:GetObject"]
    resources = ["arn:aws:s3:::tertiary-cloudfront-serve-bucket/*"]

    principals {
      type        = "AWS"
      identifiers = ["${aws_cloudfront_origin_access_identity.s3_access_tertiary.iam_arn}"]
    }
  }

  # Cloudfront can list the bucket
  statement {
    actions   = ["s3:ListBucket"]
    resources = ["arn:aws:s3:::tertiary-cloudfront-serve-bucket"]

    principals {
      type        = "AWS"
      identifiers = ["${aws_cloudfront_origin_access_identity.s3_access_tertiary.iam_arn}"]
    }
  }
}

At this point, we’ve configured the entire chain for creating a Cloudfront distribution, that logs to our central primary account.

Let’s finally create the distributions.

Distributions

Setting up a Cloudfront distribution in Terraform has a lot of configuration options, and I recommend you read the documentation to see what options you might need.

The examples I’ve posted here are complete, but will need to be modified for your environment.

Once created these distributions will serve out of our S3 serve buckets in the secondary and tertiary accounts, while directing their logs - usefully prefixed, with logs-secondary and logs-tertiary, into our central primary account.

The distribution in the secondary account will be:

resource "aws_cloudfront_distribution" "s3_distribution_secondary" {

  provider = "aws.secondary"

  origin {
    domain_name = "${aws_s3_bucket.server_secondary.bucket_domain_name}"
    origin_id   = "secondary_origin"
    s3_origin_config {
      origin_access_identity = "${aws_cloudfront_origin_access_identity.s3_access_secondary.cloudfront_access_identity_path}"
    }
  }

  enabled         = true

  logging_config {
    include_cookies = false
    bucket          = "${aws_s3_bucket.logs.bucket_domain_name}"
    prefix          = "secondary-cloudfront-logs"
  }
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "secondary_origin"

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
    compress = true
  }
  price_class = "PriceClass_All"
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    cloudfront_default_certificate = true
  }
}

and for tertiary:

resource "aws_cloudfront_distribution" "s3_distribution_tertiary" {

  provider = "aws.tertiary"

  origin {
    domain_name = "${aws_s3_bucket.server_tertiary.bucket_domain_name}"
    origin_id   = "tertiary_origin"
    s3_origin_config {
      origin_access_identity = "${aws_cloudfront_origin_access_identity.s3_access_tertiary.cloudfront_access_identity_path}"
    }
  }

  enabled         = true

  logging_config {
    include_cookies = false
    bucket          = "${aws_s3_bucket.logs.bucket_domain_name}"
    prefix          = "logs-tertiary"
  }
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "tertiary_origin"

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
    compress = true
  }
  price_class = "PriceClass_All"
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    cloudfront_default_certificate = true
  }
}

The examples in this post can be found in the Eiara GitHub.


Why DevOps Can't Be a Role

Wed 28 November 2018

Previously I spoke on how you can’t hire a DevOps role, and some very interesting conversation came out of that article.

The core response was that if the skills are present, does it matter what the role is called? If the work is being done, does it matter where those people are? Don’t DevOps teams exist, and we just call them SRE?

The difficulty with hiring a DevOps team, or DevOps Engineers as a role or title, is the problem of context.

The Context of Organisations

As businesses, we already sort by skills, and name them our various departments. IT skills go into the IT department, different IT skills go into software development.

By sorting skills by pre-existing department, we sort our people into different reporting lines, into groups with their own objectives and goals, goals which may or may not align.

By leaning on existing business structures for this sorting, we constrain how hiring decisions are made. This results in technical staff ending up in one of a couple of departments, Operational IT or Software Development.

On the surface, this seems fine.

History Binds Us

And it being fine is the mental model we have when we start hiring for DevOps skills, and by naming it as a dedicated role we need to declare what skills are needed for that role, what expertise and background. When we do this, we naturally compare those skills to the skills of the departments and team members we have, to what we think the work looks like.

My own journey aside, many people in New Zealand have come to DevOps from systems administration backgrounds, which, when we compare to the existing skills and backgrounds and how we think of the work, means we naturally want to put them into the IT or Operations departments, doing the same system administration work.

By naming the role we name the skills, by naming the skills we, as humans, need to categorise, and we categorise by what we know.

This critically impairs our culture being able to change, because we treat the skills as nothing but an extension of a previous skillset, bringing all the history and baggage along for the ride.

Culture

This impairment condemns a DevOps role to being seen by other parts of the organisation as the same limiting, controlling, and impeding IT department that has always been.

It prevents the new tools and techniques of DevOps from being useful, as there will be no space to experiment with the new tools and discover new process. Why would there be? A DevOps role fits into our existing IT strategy, why would there be a need to change?

When we consider those orthogonal goals, where DevOps in an IT department are required to adhere to a classic IT structure, we see that other business departments have no need to engage with IT or have open conversations.

We see that decisions are made without considering how IT can and should be involved from the get-go, that the shift-left philosophy of DevOps is free to be ignored.

Why involve IT? They just get in the way. They just say “no”. No organisational change has been made to support or include the feedback loop of DevOps.

We’ve just renamed a role, calling a rose by another name, and that can never work.

Other Teams

I’ve talked about DevOps predominately being placed in the IT or Operations departments, but this effect happens regardless of where roles labelled “DevOps” are placed.

By matching the skills to an existing set of skills within an organisation, we put the people into roles that are different only in name. Instead, we focus on the least transformative part of DevOps — the technical tooling and technical skills — instead of how the process and mentality will help us.

DevOps as Culture

Instead, It’s critical to consider DevOps as a cultural ideal or a practice, much like Agile Development.

Operations, or IT, should be a communicating partner in project conversations, helping guide roadmaps and working to ensure ongoing project success. IT should never be considered to be the group that just says no, but a valuable partner.

Where developers are enabled to participate in project direction and client conversations, where QA or Helpdesk are consulted on their needs.

To achieve this, we all must think about DevOps as a culture, to build on the internal ideas of empathy, mutual respect and communications in order to dismantle silos and improve outcomes.

By naming DevOps as a role we prevent that culture from being able to take root, or being able to grow.

We may use the new words, but our actions are the same as they were before.

To avoid falling into this trap, we instead need to think of DevOps as multiple people across different teams are working together to achieve a new practice, without naming any group the “DevOps people”.

This requires making everyone responsible for the success of DevOps, not just one person or one team. By choosing to see DevOps as a practice, we choose to change how we view ourselves, how we view how skills and communication exists within our business.

Instead of the same actions with new words we have new actions, words to match, and the culture to change.


On Writing Good Terraform Modules

Tue 27 November 2018

Terraform is one of the most important infrastructure-as-code tools in modern DevOps tooling, with broad cross-platform cloud provider support, as well as a considerable body of provisioning tools for services like Postgres, monitoring and logging, and numerous others.

It’s an incredible resource.

I use Terraform predominately with AWS, including into production with our clients. For us, it’s the right tool to manage infrastructure-as-code.

But, and there’s always a but, Terraform is not perfect.

Sharp Edges

As part of good, modern development practice Terraform supports code reuse in the form of modules, providing a means to abstract away concepts and ideas to make it theoretically easy to share and reuse.

And it’s that “theoretically” that’s important.

Terraform directly implements the AWS API and it is bound by the limitations and design principles of those APIs, bound by the history of design decisions made a decade or more ago.

As a result, it can be quite difficult to build generic modules in Terraform, code that provides a composed building block for more complex infrastructure. I’ve run into this over and over again, from load balancers to autoscaling groups to other edge cases.

These pieces often resist being put in a module by requiring too many specific customisation options and too many environment-specific settings. There’s no way to make them generic.

Building Better Tools

So what I’ve realised is that trying to make these components into generic modules is the problem. How I want to design infrastructure, from IAM roles to VPC layout to how to configure a load balancer is driven by my understanding of solutions architecture and good security principles.

By trying to make modules generic, I was undermining myself, preventing myself from building reusable infrastructure-as-code that fit my opinions and design principles. By trying to build smaller parts, I was prevented from building bigger and more important pieces easily and effectively.

So, this is what I’ve learned from writing infrastructure-as-code with Terraform.

Be Extremely Opinionated

Your environment isn’t mine, and can’t be mine. How you think and design, while informed by similar best practises and choices, cannot be how I design.

The things you need will not be the things I need, and should not try to be.

The modules that I am now building assert that Lambda functions should always have logging enabled and always read their payloads from S3. That my S3 bucket policies should always be role-based, instead of user-based.

That security groups between resources is based not on IPs, but on access labels that are easily changed or revoked.

These opinions come from a place of should, not can, and that is the driving force of my module designs. By providing not capability but opinions, I make it easier for myself to use these pieces, and to extend them in ways that are reasonable for me and my environments.

Build Complex Primitives

The primitives that Terraform provides are extremely low-level, being nearly direct implementations of the AWS API. This makes it easy to fall into the idea that our goal will be to build small, generic higher-level primitives out of these very low-level primitives.

This has been a source of pain for me, numerous times.

Instead, build more complex components. Should a VPC always have a bastion, and a public and private subnets? Then your VPC module should bake that in, and not provide a means to turn it off.

Should an autoscale group always have metrics? Then your module should define those, and configure them appropriately.

Should ELBs always log? And Cloudwatch Events? Cloudtrail? Your modules should define these things, the S3 buckets and the read policies, because these are your opinions.

A Single Repo is Useful

There’s some advanced tricks you can pull with Terraform if all your helper modules are in the same repository.

The overrides feature, when combined with symlinks, allows for complex layering of modules, by re-opening and merging in changes to existing definitions. I’ve already used this to define an ECS helper module, and an ELBv2-configured helper module, based off the original ECS helper.

I can now add better logging and new concepts to the base module and know that they will move forwards as expected.

Layering my opinions in this way makes it easier to define production versus non-production modules, having production with more logging, more connections and more capabilities, without breaking the core interface contract exported by the module.

Use Zero Counts

As a very strict declarative language, Terraform lacks a lot of the niceties of imperative programming, like if statements and loops.

There’s some minor capability to create multiple resources of a single type, using the count = argument. This can be extremely useful in certain ways, by enabling and disabling aspects of a module as needed.

This feature is limited and needs to be used with care, especially when resources refer to other resources that may not have been created.

Try New Things

Terraform can be very limiting in some ways but there’s a lot of power present in the tool, a great many capabilities that you can use.

Our terraform_lambda_zip module, for instance, makes extensive use of external data sources in the form of shell scripts, and local provisioners as shell scripts to handle building a complete stable archive for use with S3.

While this is perhaps beyond the normal scope of Terraform, I was able to build opinions on how a Python Lambda function should be built into the deployment tooling. This module can now be spun out into a CI/CD process, making it straightforward to build compatible, repeatable builds, with all of my opinions and goals.

Throw it Away if you Need To

I’ve been taught to be hesitant to discard code or ideas that aren’t working for me, or aren’t keeping up with my new design ideals and necessities.

Fortunately, Terraform has made it relatively easy to rebuild what I’ve built before, to build new abstractions with new goals in mind.

If the design you tried doesn’t work, that’s okay. Rewrite it piece by piece, try again in your test environment, and migrate as you make better designs. Because Terraform is meant to be an iterative tool, this workflow is well-supported and catered to.

Better Infrastructure Today

This is how I approach designing Terraform modules today. How does this encode my opinions? How should this work, to support my goals?

How can I use this in CI/CD, or with remote state, or any number of excellent powers that Terraform gives me? How should I make my interfaces easy to use by other programmers?

What are the implications of my design choices?

By asking ourselves these questions as we design, we build what works for us, works for our environments, and, crucially, work to make tomorrow easier than today.


You Can’t Hire a DevOps Role

Thu 24 May 2018

DevOps is becoming more and more popular in the tech industry over the last decade, giving us powerful mentalities around rapid rebuilds, reliability through automation, and shifting away from the idea that meeting our SLAs meant long change management cycles.

We’ve had these ideas for a decade, and they’ve grown with us as the Cloud has grown. We have a generation of software developers and reliability engineers who have never thought outside of DevOps or Agile, who live and breathe the Cloud.

Unfortunately, with the popularity of DevOps we’ve also seen a marked rise in misunderstanding what DevOps is, and one of the most common examples that we’ve seen is job listings for a DevOps Engineer role.

DevOps Isn’t A Role

To understand why a DevOps role is a fundamental misunderstanding of what DevOps is, we need to look at the underlying goals of DevOps. Our view1 is that DevOps is cultural, a mindset of dismantling barriers between developers and systems administration, between IT and the business as a whole.

Because DevOps positions the skills of deployment and reliability and maintenance as integral to the software development process, it requires that developers understand and have internalised those concepts, that their entire approach to developing software includes those concerns.

By treating DevOps as a separate role, we’re not hiring developers who understand these requirements, who care about more than just the code. Instead, we’re doing what we do today, we’re insulating our developers, as we always have. We’re retaining, reinforcing the barriers that DevOps is meant to break down, we’re ignoring everything new that we could be doing.

This isn’t hiring DevOps skillsets. It’s hiring traditional systems administrators, and it’s not practicing DevOps.

DevOps is a Practice

Instead, we need to approach DevOps for what it is, an organisational pattern for delivery of projects and products.

We need to see DevOps for what it is, an extension of the Agile methodology, of using rapid failure and rapid recovery as the core axioms of building reliable and sustainable products and services.

We need to use DevOps as what it is, a technique to break down organisational silos that unnecessarily separate developers from operations, that empathy and collaboration to drive a broader understanding of reliability concerns.

By treating it as a role and not a practise we actively exclude those who understand DevOps and would bring the greatest value to our teams, because their skills of collaboration and shared mindsets will be disregarded as they are slotted into the same unnecessary and harmful silos that we have today.

By treating it as a role, the technologies of DevOps are useless. Without the culture of collaboration, communication and the gradient of skills, how can developers know what is truly required to run the service? Without the culture, how can operations know the tradeoffs and security concerns that drove development?

By treating it as a role, how can it break down barriers?

Be Different

Don’t try to hire DevOps as a role. Hire developers who understand that running software in production is an integral part of writing software. Hire systems people who understand that they must be a part of every development conversation, that there is no barrier between “developer” and “operations”.

Hire for the collaborators, hire for the communicators, hire for those who want to go further. Hire for the practice of DevOps and Agile.

But don’t try to hire as a role.


Our Philosophy of DevOps

Wed 23 May 2018

At Eiara, we’re often asked what DevOps is, what we bring to the table and what problems we help solve.

There’s lots of great answers on what DevOps enables on a technical level, but much less focus is given to what we feel is the true focus and true value of DevOps mindsets and mentality, the cultural understanding and shifts involved.

Why Does DevOps Exist

Before we can look at the cultural changes, we need to examine DevOps as a technical practice. The idea grew out of the Agile mindset, building on failing fast and iterating towards a solution, knowing that we cannot know everything in advance. Building on rapid failure, the core axiom of DevOps moves to reduce the effects of human error in IT, through reliance on strong automation in development, deployment, and testing of services.

As a cultural practice, DevOps states that the classical silos that separate software developers from sysadmins is intrinsically harmful, arguing that these skillsets instead exist on a spectrum of knowledge.

By making the argument that software development is intrinsically and irrevocably linked to running the software, and visa versa, DevOps insists that effective, failure-resistant services are exceptionally difficult to run without dismantling those silos.

Broad Applicability

This philosophy provides the understanding not just of how our technical teams must work with each other, but also how technical teams must work with an organisation as a whole.

All software exists in response to needs, the needs of our customers, of our internal departments, of ourselves. But just as the false silos of Developers and Operations must be dismantled through DevOps, the same ideals and culture demands consideration of the needs of the business, to ensure we are asking the right questions, that we know we can ask the right questions.

DevOps transcends automation of technological needs. Instead, it grants us the power of introspection. By telling us we have overlooked the needs of our peers, it encourages us to ask who else we are missing, what other needs are unmet, who remains excluded by our behaviour.

It tells us that we are missing out on the skills of others, of the help and value that they can bring to our achievements.

It tells us that we must be inclusive, and to do otherwise is to maintain those harmful silos.

Fundamental Skills

It leads then that the fundamental skills of DevOps are not infrastructure-as-code or software development. While these skills are necessary, they are not fundamental, they are not useful on their own.

Instead, the fundamental skills of DevOps can only be Empathy, Communication and Respect.

By drawing on the Philosophy of DevOps, the reasoning becomes clear. When we treat silos as harmful, we must ask our staff to reconsider how they have done work, and in what regard they hold others. We now ask our staff to care about others’ opinions, and to communicate effectively and with empathy and compassion.

We ask for a more holistic view, where no organisational skill is less valuable than another.

This can be a difficult transition, as it asks staff to reconsider their actions against a new set of judging criteria for competence and capability. It tells us that how we have acted is no longer acceptable, and that changes must certainly be made.

Collaboration

This is the philosophy of DevOps at Eiara. This is what DevOps is to us, to live and breathe not the technology, not the tools of Continuous Integration or AWS or Azure or Docker or Cloud, but to live for the new culture of communication, of respect, of collaboration that is required for DevOps.

DevOps was never about the technology. It can never be about the technology.

DevOps is, and always will be, about the people.