With the rise of DevOps, we've seen a shift from the old world of a delivery handoff and disconnect of developer resources from operational concerns to delivery teams with a full gradient of skills, ranging from development best practices to appropriate SRE1 policies.
Coinciding with the rise of infrastructure-as-code has been the awareness of how some practices are harmful, the most primary example being "don't check your secrets into source control."
We hear numerous stories of this having happened, of AWS keys compromised or SSL certificates leaked because they were inadvertently checked into Github, to trusted self-signed certificates being used to mask malfeasance. From this example, one of the patterns we've picked up is to encrypt our secrets, to ensure that they are never stored unencrypted. This is an important step, and we can be assured to some extent that our secrets will remain secret, remain safe.
Security requirements and recommendations often touch on this necessity, point to a lack of encryption as a flaw in your information security architecture.
So we encrypt our files. We feel safe. We feel secure. We are secure. Right?
In this case, encryption has been added because our secrets must be encrypted. But, why? Because our security requirements said we must.
But, why? So that the secrets cannot be read by parties unknown.
So how does encryption prevent that?
If an attacker gains access to the ciphertext, say through a Github account breach, we feel as though we have security, that they will not be able to access the file or our secrets. But we won't know if they ever succeed in gaining access except through secondary means, such as detecting unauthorised queries.
If an attacker gains access to our deployed servers, servers which must have the secrets in the clear in order to use them, all our encryption is for nothing because anyone on that server can access those secrets, and use them as they see fit.
As a core axiom of security planning is the awareness that we will, one day, lose, we must treat our secrets as though they could be breached and misused at any moment, we start to see that just encrypting our secrets in our Github accounts or configuration management data stores is not and can not be enough.
Instead, we need to think about access.
Let's consider a standard three-tier web application deployment, with our loadbalancers, web application tier and database. Each layer of this design has access to a set of secrets, from the loadbalancers with SSL for encryption to our clients, to the web servers holding passwords to log in to the database.
At each tier, the machines are trusted users of those passwords, not just expected to have them but required. The web server can't do its job if it can't connect to the databases, after all!
From that, a user with access to that machine is also trusted, to the same degree.
This tells us that we must consider a world where our secrets may be accessed at any time, by people outside of our control, and since the location of access is trusted, we cannot rely on encryption to enforce access control.
So, what if we accepted that? That anyone or anything on that server could reasonably access those secrets?
At that point, we start thinking not of restricting access but of awareness of access. If I know that my web application restarted and decrypted the database password, then I have external awareness of that fact. If my secrets are accessed and there is no correlated application restart, I can now ask if it was a legitimate access, and react appropriately if it wasn't.
By changing our focus from performing an action to considering a result, we move from encrypting secret material to being aware of illegitimate access of the secret material by using encryption technology.
The approach of treating access as the fundamental point of instrumentation is one of the underlying ideas behind Vault and Eiara's own Ratchet proposal, where encryption is used to provide a gateway, and accesses are always logged. We are given new ways to frame "is this a valid access?" and "should I continue to trust this server?", questions that are difficult to answer without this consideration.
Site Reliability Engineering ↩