Nobody Should Have Production Access

February 13, 2023 a year ago

Written by

Fouad Matin

Nobody would ever recommend that most of their engineers keep a connection open to their production database throughout the day.

Practically speaking, there’s not much of a difference if you can’t say for certain why an engineer SSH’ed into production at 1am on Saturday.

Did they get paged from an incident? Or was it just a spurt of curiosity?

Back when I was an engineer at Segment, we talked with a number of companies about their data architecture and how data flowed from point A to B.

I’m going to tell you something you probably already know: way too many people have access to confidential, sensitive data.

Even at companies that claim to be practicing least privilege, a disturbing yet unsurprising number of folks have supersized access.

Why should I care?

We talked to hundreds of engineering and security leaders to understand the common root causes of security and engineering incidents. We heard so many horror stories that were part customer discovery, part therapy that we even started hosting a regular event to talk about it.

One of the core issues we consistently heard is “We don’t know who has access to what and why, until S#!% hits the fan and we have to take it all away” causing a complete business disruption or a major security incident.

Faking Least Privilege

Turns out in a lot of these situations, the engineering team is faking least privilege.

The benefits of least privilege access control are obvious. Reduce the risk of human error, surface area for credential compromise, or unauthorized changes that could compromise the system. It also makes it easier to audit and monitor the system, and to troubleshoot and fix issues when they occur.

When teams fake least privilege by giving every engineer AdministratorAccess in their production AWS account, it’s practically giving people permission to break production.

Starting from Zero

One of the complicated parts of least privilege is keeping track of what access people have, let alone what they should have.

It’s easiest for you (+ IT and Security) to know what you do or don’t have access to on Day 0 when you have nothing. The problem here is that no one really knows what “right-sized access” looks like, aside from “just give them what we gave the last person” — unfortunately that’s usually full access to production.

Minimum Viable Access

People try to implement “right-sized access” with RBAC. Implementing RBAC usually looks like creating read only and admin roles, with the goal of only giving people read only access. But, all the work required for mission-critical operations requires admin access, and sooner or later everyone is an admin again.

The real goal here is to get to a place where at any given time a user has the minimum viable access to accomplish the task that they are presently working on. No more. No less.

The problem in most teams is that it takes forever to get access. They wait so long that they don't want to give it up, even when they don't need it anymore.

So how do you get to a place of Minimum Viable Access?

Create groups in your identity or single-sign on provider (Google, Okta, etc) that map to a specific set of permissions for a given role
Give people an easy way to request membership of those groups
Prevent approval bottlenecks. Set up peer-reviews or auto-approvals with security and compliance controls where it makes sense
Make reviews simple for approvers - collect all the info they need to make a decision up front and notify them of requests where they’re already working
Have approvers set an access duration if they approve access
Automate moving people in and out of those groups based on their access grant duration

Practice of Least Privilege

Two phrases we haven’t mentioned yet are “Zero Trust” (ZT) and “Principle of Least Privilege” (PoLP) because they represent the goals, not necessarily the tools to get there.

In 2014, Google engineers released the seminal BeyondCorp research paper that dropped the idea of a “privileged network” in favor of dynamically verifying multiple factors as part of network requests.

Instead of assuming that “private” networks like an office can be trusted through static controls like IP address allowlisting, we can verify devices and user authentication.

Zero Trust took this a bit further and says you should also implement least privilege among other goals like security monitoring and risk-based verification. But what are you supposed to do with that?

It’s pretty hard to put least privilege into practice so we came up with some principles of our own based on what we’ve seen in the wild:

Everyone’s access should start at zero
Grant minimum viable access that’s easy to revoke
Don’t keep people waiting if they should have access
Changes should be logged
Automate the boring stuff

If you agree with these principles, but haven’t found the time to make it happen yet or worried if it’ll add friction for your team, you’re probably closer than you think. If done correctly, your team will get the access they need to ship faster without the risk of breaking production.

Back at Segment, the security team built a tool called “Access Service” to provide engineers with temporary access to their cloud resources and wrote a great blog post about the process of building it themselves. They also mention some of the areas for future development like policies and dynamic roles.

Most teams don’t have the capacity to build their own internal tool like Segment’s, so we built it for you! With policies and batteries-included, Indent can solve the core problem of zero long-lived access without adding friction for your team to ship faster.

← Back to Blog Talk to us