Eiara Blog

The last 5 posts to the Eiara Blog

You Can’t Hire a DevOps Role

Thu 24 May 2018

DevOps is becoming more and more popular in the tech industry over the last decade, giving us powerful mentalities around rapid rebuilds, reliability through automation, and shifting away from the idea that meeting our SLAs meant long change management cycles.

We’ve had these ideas for a decade, and they’ve grown with us as the Cloud has grown. We have a generation of software developers and reliability engineers who have never thought outside of DevOps or Agile, who live and breathe the Cloud.

Unfortunately, with the popularity of DevOps we’ve also seen a marked rise in misunderstanding what DevOps is, and one of the most common examples that we’ve seen is job listings for a DevOps Engineer role.

DevOps Isn’t A Role

To understand why a DevOps role is a fundamental misunderstanding of what DevOps is, we need to look at the underlying goals of DevOps. Our view1 is that DevOps is cultural, a mindset of dismantling barriers between developers and systems administration, between IT and the business as a whole.

Because DevOps positions the skills of deployment and reliability and maintenance as integral to the software development process, it requires that developers understand and have internalised those concepts, that their entire approach to developing software includes those concerns.

By treating DevOps as a separate role, we’re not hiring developers who understand these requirements, who care about more than just the code. Instead, we’re doing what we do today, we’re insulating our developers, as we always have. We’re retaining, reinforcing the barriers that DevOps is meant to break down, we’re ignoring everything new that we could be doing.

This isn’t hiring DevOps skillsets. It’s hiring traditional systems administrators, and it’s not practicing DevOps.

DevOps is a Practice

Instead, we need to approach DevOps for what it is, an organisational pattern for delivery of projects and products.

We need to see DevOps for what it is, an extension of the Agile methodology, of using rapid failure and rapid recovery as the core axioms of building reliable and sustainable products and services.

We need to use DevOps as what it is, a technique to break down organisational silos that unnecessarily separate developers from operations, that empathy and collaboration to drive a broader understanding of reliability concerns.

By treating it as a role and not a practise we actively exclude those who understand DevOps and would bring the greatest value to our teams, because their skills of collaboration and shared mindsets will be disregarded as they are slotted into the same unnecessary and harmful silos that we have today.

By treating it as a role, the technologies of DevOps are useless. Without the culture of collaboration, communication and the gradient of skills, how can developers know what is truly required to run the service? Without the culture, how can operations know the tradeoffs and security concerns that drove development?

By treating it as a role, how can it break down barriers?

Be Different

Don’t try to hire DevOps as a role. Hire developers who understand that running software in production is an integral part of writing software. Hire systems people who understand that they must be a part of every development conversation, that there is no barrier between “developer” and “operations”.

Hire for the collaborators, hire for the communicators, hire for those who want to go further. Hire for the practice of DevOps and Agile.

But don’t try to hire as a role.


Our Philosophy of DevOps

Wed 23 May 2018

At Eiara, we’re often asked what DevOps is, what we bring to the table and what problems we help solve.

There’s lots of great answers on what DevOps enables on a technical level, but much less focus is given to what we feel is the true focus and true value of DevOps mindsets and mentality, the cultural understanding and shifts involved.

Why Does DevOps Exist

Before we can look at the cultural changes, we need to examine DevOps as a technical practice. The idea grew out of the Agile mindset, building on failing fast and iterating towards a solution, knowing that we cannot know everything in advance. Building on rapid failure, the core axiom of DevOps moves to reduce the effects of human error in IT, through reliance on strong automation in development, deployment, and testing of services.

As a cultural practice, DevOps states that the classical silos that separate software developers from sysadmins is intrinsically harmful, arguing that these skillsets instead exist on a spectrum of knowledge.

By making the argument that software development is intrinsically and irrevocably linked to running the software, and visa versa, DevOps insists that effective, failure-resistant services are exceptionally difficult to run without dismantling those silos.

Broad Applicability

This philosophy provides the understanding not just of how our technical teams must work with each other, but also how technical teams must work with an organisation as a whole.

All software exists in response to needs, the needs of our customers, of our internal departments, of ourselves. But just as the false silos of Developers and Operations must be dismantled through DevOps, the same ideals and culture demands consideration of the needs of the business, to ensure we are asking the right questions, that we know we can ask the right questions.

DevOps transcends automation of technological needs. Instead, it grants us the power of introspection. By telling us we have overlooked the needs of our peers, it encourages us to ask who else we are missing, what other needs are unmet, who remains excluded by our behaviour.

It tells us that we are missing out on the skills of others, of the help and value that they can bring to our achievements.

It tells us that we must be inclusive, and to do otherwise is to maintain those harmful silos.

Fundamental Skills

It leads then that the fundamental skills of DevOps are not infrastructure-as-code or software development. While these skills are necessary, they are not fundamental, they are not useful on their own.

Instead, the fundamental skills of DevOps can only be Empathy, Communication and Respect.

By drawing on the Philosophy of DevOps, the reasoning becomes clear. When we treat silos as harmful, we must ask our staff to reconsider how they have done work, and in what regard they hold others. We now ask our staff to care about others’ opinions, and to communicate effectively and with empathy and compassion.

We ask for a more holistic view, where no organisational skill is less valuable than another.

This can be a difficult transition, as it asks staff to reconsider their actions against a new set of judging criteria for competence and capability. It tells us that how we have acted is no longer acceptable, and that changes must certainly be made.

Collaboration

This is the philosophy of DevOps at Eiara. This is what DevOps is to us, to live and breathe not the technology, not the tools of Continuous Integration or AWS or Azure or Docker or Cloud, but to live for the new culture of communication, of respect, of collaboration that is required for DevOps.

DevOps was never about the technology. It can never be about the technology.

DevOps is, and always will be, about the people.


Data is the New Oil

Fri 23 March 2018

“Data is the New Oil” as an idea really took off in 2016, touching on the modern value of a business revolving around data collection, of the value of their users or the demographics of their customers. We have framed it as the hidden value we didn’t know we had, waiting for us to unlock it.

However, in the light of the reveal of Facebook allowing Cambridge Analytica deep access to their treasure trove of data, explicitly to achieve ends that are deeply concerning to modern democracy and the health of the Internet at large, “data is the new oil” requires considerable re-examination.

For Facebook, this was their business model. It was always their business model. Allowing access to their data was undertaken with the full support of their stated goals, of what they set out to do.

This was their intent, and the consequences belong solely to themselves.

So when we pursue “data is the new oil”, when we seek to unlock our own hidden value, how will we be different from Facebook? How can we learn from their example?

Data is powerful, offering deep meaning and insight never before accessible, but it comes with new concerns and requirements. An oil spill is a catastrophic consequence, a necessary consideration of its use.

But to call data the new oil, we must also understand that data is as dangerous as oil and consider ideas such as data leaks or data spills, and describe the catastrophic events that are as much a consequence and consideration of the use of data as of oil.

When we then pursue the value of data, we must build organisations and policies with this in mind, that assume adversarial usage and intent, that believe that the behaviours of Cambridge Analytica are not aberrations or outliers, that this is the norm. To understand that Cambridge Analytica is the organisation that got caught, not the only organisation behaving this way.

We must believe that, without a robust organisation, without that consideration for adversarial intent, that data spills are inevitable.

We must ask, what results do we want? Do we value the privacy of our customers and users? We must ask, if we truly value them and their privacy, will an agreement not to misbehave ever be enough?

We must ask these questions because there is no way to unleak the data, no way to clean up a data spill. Consequences, regardless of our intent or stated values and goals, will remain consequences.

To use data as our new oil, we must realise the value of data without creating the conditions for privacy breaches. We must consider the data spill and work tirelessly to prevent it. We must analyse our data ourselves, and offer only our own interpretations. We must build our organisations such that our people care about safety and privacy, that the organisation promotes safety and privacy, that it makes violating safety and privacy difficult.

We must do more than say we care, we must build a system that cares, that will care when we are not watching.

And we may think that we are too small to notice, that we will not be that target, but there is no such thing as a target too small to notice. In the new world, the value of data grows when it is combined with other data, our data included.

Cambridge Analytica targeted a giant, and succeeded. We will be targeted as well. Will our data adversaries succeed against us?

Cambridge Analytica has shown us the future of “data is the new oil,” and the public concern over that future has shown us the questions that we must ask, the culture that we must strive for.

This is our future.

Are we ready for it?


Generating Python Payloads for AWS Lambda

Sun 04 February 2018

AWS’s Lambda, and the serverless architecture in general has been a huge boon not just to our ability to build experimental web software, but also a new level of interesting and valuable workflow and automation tooling.

However, one of the things that’s been regularly difficult is easily integrating the build process into our Terraform cloud automation, in a manner that is consistent with the goals and philosophies of Terraform.

To that end, we’ve open-sourced the Terraform module we use to automatically build Python packages for use on AWS Lambda. It’s intended to easily integrate into your existing Terraform infrastructure and only rebuild the payload archive on any change to your Python Lambda function.

You can get the module here.


Deployment Roles

Fri 13 October 2017

One of the core aspects of modern DevOps process is the CI/CD pipeline, where newly built deployment artefacts can be easily pushed into our various environments, automating the tedium of deployment and going live.

In terms of convenience, this is great. Complicated deployment procedures are turned into turn-key automation, enabling us to get new software into testing or production, and getting feedback on our work has never been easier.

However, when we’re building these systems, security is often a secondary thought, or not considered at all, and it’s all too easy to use highly-privileged credentials to set up these deployment roles.

We recently built out a deployment system for AWS Lambda, using Terraform, and it takes a lot of thought to consider what, exactly, our deployment role should be able to do, but also what we’re trading when we limit things like that.

The Need

AWS Lambda is easy to deploy and easy to work with, and there’s a myriad of opinions on how to deploy code to it.

We wanted to be able to use our standard TravisCI-based build process for pushing new versions of a codebase to AWS Lambda, by using the industry-standard tool Terraform to manage the deployment process.

As Terraform requires AWS API access, the fastest way to achieving this goal is just to make standard AWS access keys and insert them into the TravisCI UI. While this is largely safe, in that TravisCI is well-defended and manages their own breach detection, including highly-privileged access keys here is still a risk if someone breaches your source code repository or one of your developers’ workstations.

Because TravisCI has to be trusted, those access keys are available in the clear, and so anyone with access to your source code repository must have access to those keys. This would be the same with any CI provider, because any CI provider would be in the same position of high trust.

So what should we be doing?

Principle of Least Access

In this, and every case, we should be thinking about how little access we need to get the job done.

When we built our system to deploy into AWS Lambda, we asked what we needed and how we should be thinking about deploying functions.

We decided on using a versioned S3 bucket to provide a historic record of function payloads, and a limited execution role to be passed to the Lambda itself.

From those decisions, the smallest set of permissions needed to deploy a function would be: - Write to an S3 bucket - List, create and delete functions - Pass in an IAM role for execution

It’s important to note that our deployment isn’t going to be creating the S3 bucket, or the IAM role needed for execution.

S3 Bucket

The S3 bucket permission is fairly straightforward, we need to be able to upload payload files, and know where to upload them. This part of our role won’t need to delete or modify existing files, as we’re focussing solely on letting S3 manage the historic record.

List, Create, Modify

The second piece is being able to modify functions. Because the CI system is authoritative when it comes to the deployment of functions, it needs to have the permissions to make these modifications.

However, the CI system should only be authoritative over its own functions. In our design, we implemented this by requiring a CI-specific prefix for the function names, ensuring that functions created through other means couldn’t be touched.

Pass a Role

Finally, a newly-created function needs to have an execution role associated. This role determines what a function is capable of doing, and this is probably the most important aspect of ensuring a consistent security profile when building this sort of CI service.

In order to create a function, the deployment role needs to be able to pass a role. In general, this could be any role, from the most basic AWSLambdaBasicExectionRole to the core admin role.

In order to ensure we’re not able to assign an admin role to our function, we set up our deployment role to only be able to assign a single, pre-determined execution role, and only that role.

By doing this we can be assured that our functions can’t leak elevated access, and that functions can never do anything more than we expect.

Implications

Limiting our deployment role to this extent does come with a major potential downside, in that it introduces a gatekeeping stage every time we need to deploy functions which require different levels of access to our AWS accounts.

This kind of gatekeeping can be one of the major drivers behind creating a shadow IT environment, because it interferes in rapid testing and iteration processes.

This level of deployment role lockdown may not be appropriate for your environment, but it is necessary to consider and have the conversation in your team as to its necessity and impact, as well as the impact of not implementing these ideas.

Code Examples

So what does the setup for this deployment role look like in practise? Let’s look at some Terraform code to set it up:

The Policies

This first block of code defines the IAM policies required to create AWS Lambda functions. These policies are the core that enables what our CI role can do.

data "aws_iam_policy_document" "lambda_create" {

  # So that CI can bind the Lambda to the execution role

  statement {
    actions = [
      "iam:PassRole",
      "iam:GetRole",
    ]
    # Allows us to only provide one role to our lambda function
    resources = [
      "arn:aws:iam::account-id:role/role-name",
    ]
  }

  # So that we can create and modify all the Lambdas

  statement {
    actions = [
      "lambda:CreateAlias",
      "lambda:CreateFunction",
      "lambda:GetPolicy",
      "lambda:DeleteFunction",
      "lambda:GetFunction*",
      "lambda:ListFunctions",
      "lambda:ListVersionsByFunction",
      "lambda:PublishVersion",
      "lambda:UpdateAlias",
      "lambda:UpdateFunctionCode",
      "lambda:UpdateFunctionConfiguration",
    ]

    # But only the ones with the prefix
    resources = [
      "arn:aws:lambda:region:account-id:function:cicd_prefix_*",
    ]
  }
}


data "aws_iam_policy_document" "s3" {
  # Allows our CI provider to list the payloads bucket

  statement {
    actions = [
      "s3:ListBucket",
      "s3:GetBucketLocation",
    ]

    resources = [
      "arn:aws:s3:::example-deployment-bucket",
    ]
  }

  statement {
    # So we can update files in our own bucket, but not get or delete them.

    actions = [
      "s3:PutObject",
      "s3:PutObjectAcl",
    ]
    resources = [
      "arn:aws:s3:::example-deployment-bucket/*",
    ]
  }
}

The User

The second part is the user itself, and the group membership. By creating the user in this way and isolating it from the existing roles in AWS, we’re able to strongly control our role capabilities, and providing a skeleton for creating new deployment roles in the future.

# Creates an AWS user to hold the CI deployment role

resource "aws_iam_user" "ci" {
  name = "ci_user"
}

resource "aws_iam_group" "ci_group" {
  name = "ci"
  path = "/ci/"
}

resource "aws_iam_group_membership" "ci_group_membership" {
  name = "ci-group-membership"

  users = [
    "${aws_iam_user.ci.name}",
  ]
  group = "${aws_iam_group.ci_group.name}"
}


# Create the policy that permits creating Lambda functions

resource "aws_iam_policy" "allow_create_lambda" {
  name        = "allow_create_lambda"
  path        = "/ci/"
  description = "allows limited lambda source creation and modification"
  policy      = "${data.aws_iam_policy_document.lambda_create.json}"
}

# Create the policy that permits uploading Lambda functions to S3

resource "aws_iam_policy" "allow_ci_S3" {
  name        = "allow_ci_S3"
  path        = "/ci/"
  description = "allows limited S3 access for CI"
  policy      = "${data.aws_iam_policy_document.s3.json}"
}

# Connect the policies to the group that our CI user is a part of

resource "aws_iam_group_policy_attachment" "lambda_attach" {
  group      = "${aws_iam_group.ci_group.name}"
  policy_arn = "${aws_iam_policy.allow_create_lambda.arn}"
}

resource "aws_iam_group_policy_attachment" "s3_allow" {
  group      = "${aws_iam_group.ci_group.name}"
  policy_arn = "${aws_iam_policy.allow_ci_S3.arn}"
}

# Creates credentials that can be used in your CI provider of choice

resource "aws_iam_access_key" "ci_credentials" {
  user = "${aws_iam_user.ci.name}"
}

Implications

Pulling the credentials as we are in aws_iam_access_key does have the implication that these credentials are being written into the Terraform state file, which may be inappropriate for your threat model.

If it is inappropriate, generating the access keys from the AWS console will be a better option, and should be explored instead.