Why Multi-Account in AWS?
As companies mature in the cloud, they will often come to a crossroads in that maturing where someone will say "we need more than the X accounts we have, it's what AWS says to do!" and inevitably there will be pushback of "What we have works, so no need to change it!"
So, why would you bother with multi-account strategy? Let's dig into a whitepaper and see how it can apply to cloud-maturing organizations.
Let's explain multi-account
Multiple accounts can exist in AWS - as you can imagine. Indeed, most companies I've seen have a few accounts already, something like "Non-Prod" and "Prod" and possibly a "Security" or "Operations" account and they all roll up to a "Payer", "Main", or "Master" account. So, most companies operate fine on these 5 accounts, for quite a while.
However, there is a method by which you can explode that account amount - I've seen organizations that are very cloud mature with more than 100 accounts and into the several hundreds. Why would you even do that? It's simply because you want to take advantage of the features of having workloads isolated.
What's a workload?
This is one of those things that the definition varies by organization. A lot of places I have seen it be a product - a collection of services under the aegis of a single goal, such as "This is checking out on our e-commerce site" or "This is processing employment applications" or similar things. Another way I have seen it defined is by project, if you're a PMO-focused organization. Projects generally produce things that go through phases and end up on a maintenance and operations phase that is still tied back to a certain funding. The last most common thing I've seen defined as a workload is the more nebulous "initiative" - these workloads tend to be a bit bigger than what I am used to, but can work well.
By and large, workloads should be identifiable by some method in your organization, whether it's an app in your CMDB, a cost line in a cost center, or a project ID. We'll get to it later, but one of the benefits of multi-account is some financial operations - FinOps - gains that can be made in expediting if your product is producing the expected value.
The Whitepaper
In March 2021, AWS produced a Whitepaper called Organizing Your AWS Environment Using Multiple Accounts. It has gone through several iterations, including one most recently (as of writing this) in March 2024. It is a very technically in-depth and complete document for, as you'd expect, discussing how to organize your AWS environment into using multiple accounts. It is a oft-cited whitepaper but it can be very easily overwhelming for those early on their maturing path for AWS. What this blog post will do is break down the major points and explain how they impact your organization.
One bit of prework that helps is recognizing that AWS has a Well-Architected Framework that includes 6 pillars - best summarized in this AWS blog post.
There are 8 major benefits that AWS outlines in this whitepaper, some are obvious some are a little more nuanced. Those 8 are:
- Group workloads based on business purpose and ownership
- Apply distinct security controls by environment
- Constrain access to sensitive data
- Promote innovation and agility
- Limit scope of impact from adverse events
- Support multiple IT operating models
- Manage costs
- Distribute AWS Service Quotas and API request rate limits
Let's break them down one-by-one
Group Workloads Based on Business Purpose and Ownership
When you have workloads, you often have both a reason to have the workload and someone responsible for it. By isolating by account, you can break down and control that ownership and keep otherwise segmented workloads as groups.
For example, imagine you have 3 Workloads for each of the following managers: Joey, Nitish, and Sarah and they're grouped into a few products, something like this:
In an immature cloud environment, all of these would be mingled together. But, let's say Nitish gets promoted and is no longer responsible for the products under him. Instead of the accounts rolling up to him, they get split between Joey and Sarah. In multi-account, it is easy to move an account between someone's organization, so you can simply move Nitish's accounts between people.
This simple example shows how it operates on people, but it works the exact same way on business units. You could replace Joey, Nitish, and Sarah with Customer Tools, Finance, and Marketing and it'd be the same - accounts and their workloads can move freely between organizations. You can even use both people and business units - however complex you want your hierarchy to be, you can do. The world is your organizational unit oyster.
Apply Distinct Security Controls by Environment
In a previous post, I talked about Service Control Policies (and grilled cheese). Without getting into that article, you can apply policies to accounts by their grouping. These policies can do things like prevent them from deleting guard rails, prevent from using new services, and requiring certain standards be met to deploy.
Imagine if you had a group of accounts that belonged to high CPU usage teams - like a data science or AI team. You might want to let them use bigger EC2 instances than most other teams in the organization. You can apply these policies at the account grouping level that would prevent most others from using huge, powerful instances, but still giving your Data Science team what they need. Since this is tied to accounts (and account groupings), you don't have to worry about them getting past another policy or tool, it's set at account level.
In addition, if you have regulatory requirements that differ per workload, you can also keep them groups and enforce certain things. If you have a workload that requires HIPAA, you might want to enforce passwords and keys differently than if you had a workload that demands FIPS. By keeping them in separate accounts, you can use different levels of tools like AWS Config to ensure ONLY the requirements that apply are being used and that you aren't tying yourself down unnecessarily or missing out on requirements.
However, the most common one to see is the difference between a non-production and production workload account. Usually, you can see certain things in non-production, such as logs, natively, but in production you'd be relying on your observability platform to see logs and metrics and thus only certain people would have read access to a production account, instead of all developers.
Constrain Access to Sensitive Data
In the course of normal business operations, you often either generate or work with data that you don't want plastered all over your company. For instance, if you process agriculture science, you wouldn't want your proprietary genetic data to be available to your customers for free. Conversely, if you were processing medical records, you might be receiving some of the most private data of individuals. This data may have different levels of sensitivity, but it's all data you want to keep under appropriate controls.
If you isolate the data by accounts, you can prevent others who don't need to know from seeing the data. For instance, in your medical records processing, you can keep images and scans in one account while sending the processed data, such as claim status, to another account. This way, people who are simply working on billing don't have to have access to health information and you can keep a separation of duties. With the agriscience example above, you can keep the data isolated away from sales or customer service who don't need the raw genetics information to sell or support your customers, thus reducing potential exploits or exfiltration - you don't want a rogue employee to steal your data if you can prevent it!
Promote Innovation and Agility
People, generally, like to create things. Whether they want to bake a cake, build a shed, or start a new business. People often get joy in this and it helps change the world a little bit each time. When people create things in companies, even if they don't work, it helps the company grow and learn.
Having an account to work in means that they are not as constrained by the past decisions a company has made. If a company isn't used to the modern Event Driven Architecture ("EDA"), it can slow down some modern development techniques. Having an isolated account means that they can design EDA and use cloud native tooling to get there, providing a modern take on something as simple as creating an API or a dataset.
In addition, by having their own accounts to deploy in, they are not tied to the speed of the whole organization they are working with, but can deploy with the frequency they need in order to deliver consistent results. This agility (not be confused with Agile™) helps speed up overall development and increases time for innovation. In turn, this makes development the genesis for innovation rather than waiting for an event like a hackathon.
Limit Scope of Impact from Adverse Events
Adverse events is one of those phrases that keep your risk people - your CISO all the way down - up at night. Sometimes referred to as "blast radius", when determining risk, you also determine other applications that can be affected when something goes wrong.
If you have one big happy family account, and someone gains access to the account, they gain access to all the resources. Not only can they at least list the metadata of most things, they might be able to spin up new resources and cost you money, poison your routes and exfiltrate data, or in the worst case scenario, hold everything you own hostage. If you have several smaller accounts, traversing them gets a lot more difficult and you can quickly isolate an account from the rest of your network before it can reach out and do worse things.
Another things not usually considered is that people make mistakes. If someone goes to "delete their VM" and deletes a different one, you might find yourself in trouble. If they see a component that's throwing errors and "helpfully" stop it instead of letting that team investigate, that can lead to significant complications. Having each set of workload pieces in isolated accounts means you can only delete or alter your own VMs and other components.
Support Multiple IT Operating Models
Every organizations I've ever worked with has had their own IT operating model. I've seen true, pure DevOps, I've seen platform and operations completely separate, and pretty much every flavor in-between. A lot of companies usually go through "Traditional" operations, then to a Cloud-specific operations, and eventually to DevOps, but the path between all these is long and often winds around several bends and months. The good news is, multi-account supports all of these operating models.
In traditional operations, application teams create applications and operation teams operate them and the platform they run on. There's usually a good mix of self-hosted third party services, like PeopleSoft, that an operations team is running. Even in this model, applications, whether third part or not, can be hosted in individual accounts that an operations team can run while maintaining the overall governance required. Usually, this involves grouping accounts and applying different policies and procedures on a given group.
As you migrate from traditional to DevOps, the control becomes less centralized and more automatic, thus your governance can be applied by the appropriate development teams to their workloads in each account. As it continues to mature through DevOps, you'll have less and less centralized human teams and distributed human teams, meaning to prevent collisions, or unnecessary procedures, having a multi-account setup is almost required.
Manage Costs
Finance operations - FinOps - is a hot topic. No one wants to write an expensive check to a vendor without understanding what it is they're paying for. Unlike some vendors, AWS can break down pretty succinctly. They offer, out of the box, a cost breakdown by account, which can be directly attributed to a workload in a multi-account setup. For instance, mine looks like this:
This chart is my actual costs broken up by my 6 accounts that all get used for different things. It's pretty easy to see, quickly, that my POC account uses around $30/mo, but Nullsheen costs me practically nothing.
In addition, you can stack tagging of your resources - little bits of data to indicate what they are tied to, like a project number or cost center - and accounts to get very specific breakdowns.
Another benefit of this is if you have an account that has is less production ready, like an innovation account or a sandbox account, you can control or remove resources from that whole account regularly to keep costs down instead of having a runaway resource running for months. This "cleaning the sandbox" is a common task that can be done with a number of free tools and helps keep costs low while allowing people to experiment.
Distribute AWS Service Quotas and API Request Rate Limits
This is the most minor for smaller organizations and one of the most important for massive organizations. AWS, by default, introduces limits to things like the number of instances you can run and number of API calls you can make. If you are a small organization, you're unlikely to come across this or maybe once or twice. If you're a big organization, you will regularly hit caps if you're in one account. However, these limits are per account, so having multiple accounts reduces the need to ask for quota and rate limit increases.
Summary
Now that I've walked you through the main parts of "why" you should use multi-account, and hopefully you can see how it benefits your organization, you can go deep dive into the Whitepaper ad see the technical implementations and work with your teams to introduce this best practice from AWS.
Now, armed with this breakdown, go build something good!