It's not a Data Center.

Photo by Thomas Bennie / Unsplash


Confession

Data centers are pretty cool. Racks and racks of compute surrounding you, the hum of uninterruptable power supplies, the wiring so beautifully and elegantly done, the small satisfaction to know your card lets you into there where many do not get to wander, the too sterile smell, and even the oddly chunky suppressant overhead. They're endlessly fascinating to think about and observe. How many thousands upon thousands of processes are happening right now? The amount of data that is physically stored in things that you could move around (though some of those bigger 4U storage pieces would be scary to do that with)? How many thousands or even millions of people rely on something so small yet dense.

Many data centers I have been in have been surrounded by concrete and more enough to seem like it could take a direct hit from almost anything. I'd be lying if I said I wasn't mostly in colocated data centers ("colos") where several entities share the space, but that still doesn't diminish how cool they are. Heck, I am convinced most tech people would still take an afternoon to see a cool data center. I always feel like I am a short step away from putting a 42U rack in the basement just to have my own, small data center, just for fun.

A data center I will probably never see, though, is an AWS Data Center.

He said it! He said Data Center and AWS!

AWS does run data centers. There are massive buildings with compute racks, miles upon miles of cabling, enough power running to them that they have a distinct draw on the local grids, dark fiber splices straight in, and all kinds of goodies that would be better than a museum trip.

However, just because AWS runs data centers doesn't mean they are data centers to us. There are layers upon layers of things AWS does to make it more to customers that make it so it is more than a normal data center. So, let's take a bit of a journey.

How secure is your data center?

Many places that run a data center aren't in the business of running data centers for their primary investment and income (though, obviously, some are). Many companies that own a data center (and often two, one for backup and disaster recovery) are instead doing things like retailing items to customers, handling their financial transactions, processing appointments, or even just keeping their music playing. Just because they have a data center doesn't mean it's their primary focus.

Amazon, however, sells "Data Center" services to people. How much they can spend on securing your data is astronomical compared to what you can spend as an auxiliary piece to your business. Amazon has physical security guards, 24/7 physical coverage, employees whose sole job it is to ensure access is appropriate and revoke it when necessary, backup power, backup water, literal red tapes zones that even other Amazon employees need to have an escort through, security cameras, physical tampering detection on all their compute, and the list goes on for quite a while. It's a significant investment. I am willing to bet that, for most companies, they can't or won't invest that much in securing their data center - it's a very expensive overhead!

Cool, I'll just shift all my stuff to them

That's not what this is about.

If you wanted to physically have control of a server or even just manage virtual machines, you're gonna have a bad time. Amazon (nor most managed data centers) won't let you go in and plug a keyboard into a blade. You're welcome to run your virtual machines on their hardware - but they're going to charge you more than you hosting it yourself.

Don't treat cloud providers as data centers, or you will burn so much cash that your head will spin. Just because you think you want 1000 VMs to run your application doesn't mean you should just turn on that many on AWS. If you must run it on your own isolated, untouchable server, then AWS has options - but none are cheap.

What you really need to think about is how your applications run, how your infrastructure runs, and how your overall business can instead leverage the cloud to be cheap, efficient, and make the hardware issues someone else's (such as Amazon's) problem.

I've heard this before

There's probably a reason that you've heard this a lot, but I want to break it down a bit, so that people leveraging the cloud don't confuse it for a data center, spend all the money, and then get upset and think that everything should be something they own, secure, and spend on capital.

Where cloud shines the most is in abstracting things away. Starting with something simple, if you have a script that needs to run and do one thing on-prem, with some virtualization, you can spin up a tiny .5 GB RAM, 1 vCPU VMware machine. Throw the script and it's necessary interpreter on, slap a cron on it, and let it cook. Really, this was the method for a chunk of my career. If you had a few related ones, you might throw half a dozen on the same tiny VM. I once saw hundreds of python scripts executed off one box, but that's a terrible story for another day.

What happens if that VM locks up? If there's a half dozen scripts, it's probably an email that you will "bounce the box" and cron will take over again next cycle. What about patching it? If you patch every 3rd Tuesday night, you might just have an outage window. What if one of the scripts needs to have version 1 of a library when the other needs version 2? Time to spin up another VM for those that need legacy libraries. What if your storage seizes on that VM? Easy, restore the backup on another VM Host that doesn't have storage seizing and hope the backup is fresh enough. What if that new junior dev jumps on that VM and dumps everything except his script? Time for a learning moment and a restoration - hopefully they nuked nothing critical.

That's...a lot of administration for running a script. This is also supposing you had 1 or 2 VMs - imagine having a fleet of 100 or more VMs that go through this. The administrative load just on managing credentials (whether through SSO, firecall IDs, or checking them out of a secrets manager) could be overwhelming to do or be an expensive tooling. It might seem cheap in capital expenditures, but paying for the skills and time to handle that is not cheap. SysAdmins know their value and will ensure you pay for it.

However, cloud providers abstract this away. Instead of managing patches, or storage, or even access to unnecessary filesystems, they offer ways to just run a script - AWS Lambda for example. The great thing is, you pay per million requests or billions of gigabyte seconds. You don't need to maintain an entire VM, no matter how small, just to execute small pieces of code when leveraging cloud. In this finite example, you'd simply move your code to cloud, and wouldn't need to spin up a whole VM just to run it.

a close up of a clock on a piece of paper
Photo by Annie Spratt / Unsplash

Another great way to think about why AWS isn't a data center is that it's not intended to be one. It's meant to be a series of tools to leverage that keep you away from thinking about the difficulties of running a data center. If you want to store 50 PB of data, they already have handled procurement of the hardware, audits on the systems, and even ways to transfer that much data without going over any wires (AWS Snowball). All you have to do is ready the data and pay for the storage. That's it.

So what?

Here's the main crux of everything - don't use AWS as a data center. Don't treat it as a data center. What it can do for you is allow you to move quickly and adapt without the overhead required from traditional infrastructure. Things can be built very quickly, take very little cost, and are already secure from a compute side by default. Imagine never having to patch a database or scale its storage manually. Or building a Kafka topic with a quick bit of Terraform code. Imagine just building one Kafka topic and not needing a whole cluster.

If you go treat AWS as a data center, it will be problematic in costs and the same traditional overhead. Look into what you can get rid of as far as overhead when you explore and use cloud. Make it a fast place to develop and a cheap place to execute tasks. Throw away your notions of hardware management or physical security. AWS is not a data center for you. It is an innovation center that enables you and your teams to solve technology problems without worrying about what that sound is over there or if your internet connection to your data center gets cut.

Marty Henderson

Marty Henderson

Marty is an Independent Consultant and an AWS Community Builder. Outside of work, he fixes the various 3D printers in his house, drinks copious amounts of iced tea, and tries to learn new things.
Madison, WI