What Version is Your Infrastructure?
Author: Elijah Zupancic
I’ve often assumed that everyone knows the importance of application versioning and surprised when someone thinks differently. On the rare occasion that I’m confronted with someone who insists that they shouldn’t version anything, I stare at them and die a little inside. Luckily for my mental health, most technologists agree that versioning is important and yet disagree on the method. Application versioning schemes are one of those debates like tabs vs. spaces or what way to turn the toilet paper roll. I’ve spent hours debating the advantages of one type of versioning over another. I’m here to say is that I don’t care about the versioning style. I care about what is not getting version.
Despite years of collective pain trying to obtain knowability (also known as what the hell is going on) around what code is running in production our industry is only in the infancy of ascribing the same type of knowability to the infrastructure. Think about it. If you know the version of your software artifact but don’t have certainty about the underlying platform in which it is running, do you really have any certainty about your application? It is like versioning your application to 3.something or 4.maybe because you are missing a key portion (the subversion) of the information needed to know what is exactly running. With the first version number you have a foggy idea about what is going on, but that’s it. This leads us to the question:
Can you tell me what version is your infrastructure?
Being able to answer this question is at least as important as being able to answer about your application’s version.
If you still don’t agree with me about why you may want to know what the exact footprint of your application’s infrastructure is, think about the following questions:
- What do you consider infrastructure? What is meaningful to you about the platform in which your application is running?
- What is the first question you’re asked after you report a bug?
- How do you know what shared libraries an application is using at runtime?
- What is the version of the application framework or server running your application?
- What version of the OS is running your application? When was it last patched?
- What deviations to your OS’s configuration were made since it was installed? Better yet, what install options were used when it was set up?
- What is the version of the database that your application is connecting to?
- What are the firewall rules needed to secure your application?
- What scheduled jobs need to be run to make your application function? Are they invoked from another system?
- Are you selling a (metaphorically or literally) boxed software product that you have zero control over where it is run? In that case, this article may not be very helpful because you are out of luck unless you can run it in a container. If you can do that, then keep reading.
If you answer involved sshing into the box or checking a wiki, you should keep reading.
Converging on Immutable Infrastructure
Two decades before the term Immutable Infrastructure was ever used, Software Configuration Management (SCM) was an area of study for academics and professionals alike. This discipline is somewhat generalized because it focuses on all aspects of the state in which a software system could be configured. Yet, infrastructure state changes were very much within its domain, so in fairness, we can’t begin to talk about Immutable Infrastructure without acknowledging the efforts of the SCM community. Moreover, just the creation of this discipline alone was a jump forward in the knowability of production applications because it brought some modicum of self-awareness to the act of software deployment and hence infrastructure state changes.
I couldn’t find out definitively who coined the term Immutable Infrastructure but I did find references from Chad Fowler in 2013 and it wasn’t until January of 2014 that the searches for the term started to take off. I find the term a good approximation but ultimately an inaccurate description of reality where interpreted literally. I don’t want to say that completely unchanging infrastructure abstractions aren’t possible, but there are going to be state changes on the underlying hardware, so I don’t see it coming anytime in my lifetime. You can think of immutability as one end of a spectrum. On one end you have an unknown possibly changing infrastructure and on the other end you have a completely known unchanging infrastructure. In this sense, immutability is tied to knowability. Thus, if you don’t know the state — how can you make the claim that it is unchanged compared to a previous state.
All infrastructure is mutable to some extent — otherwise it wouldn’t be useful. What would a computer be without state changes? What would it compute if there were no inputs? With the current state of software system complexity, practically speaking no one can predict with absolute certainty what instructions will be running on a system at any given time when something like a web service call is made. Sure you can make a million technical arguments about profilers or instructions executed by your application, but at the end of the day with a typical multi-threaded operating system full of junk I don’t believe that any non-academic operator has any form of knowability about the actual instructions run as a whole on a system. There are so many things going on at once interacting with each other before we even start to go deeper and look at the influence of cosmic rays on variability of system state. This is why the halting problem is so confounding. With the typical systems of our era, you don’t even have certainty about what software is actually running yet alone when that unknown application will complete! That said, I’m sure that I’m technically wrong in some theoretical sense, but I’m convinced that I’m pragmatically correct.
Now, if we move away from absolute definitions of immutability towards more practical forms of immutability, what does this mean? For one, this becomes less of a discussion about the science of computing and more a discussion about the craft of creating knowable systems for human agents. Thus, we arrive at a practical definition of Immutable Infrastructure as a type of infrastructure that minimizes unknowable state changes.
Before the era of virtual machines, technically mature organizations took great care to standardize the hardware and operating systems for application server clusters. This was a manual process that became automated with the advent of disk imaging utilities or network booting technologies. These were both huge strides in getting us toward a more immutable definition of an application’s host system because they provided predictability about the initial state of the systems running an application.
At this time, setup scripts for building machines were created on an ad-hoc basis with no standardization. Each company had their own way of building systems and as time went on the different systems that started in the same state would start to slowly diverge and become less predictable. There may have been common patterns, but not a clear versionable artifact defining infrastructure with the arguable exception of disk images. However, we did start to see the precursors of a more immutable infrastructure with tools that would reboot systems between user logins and restore the OS to a predefined state.
Although VMware was founded in 1998, it wasn’t until after 2001 that we started to see the wide scale adoption of virtual machines in data centers. Once they came on the scene, the operational efficiencies started a quiet revolution. While improving the cost to performance ratio by increasing the utilization of the underlying hardware (in terms of power consumption and density) virtual machines also improved the knowability of infrastructure. They did this by allowing the free exchange of disk images as a common artifact of an operating system’s definition. Not only did VM images provide a known starting place like physical disk copies and network boot configurations but it also provided snapshots that would allow you to revert to known states. Suddenly, you could efficiently develop standardized OS images that you could share within an organization. However, the sheer size of the disk images of virtual machines limited the utility of the solution in terms of reusability between developers. It was the operators that benefitted primarily.
Later, tools for building and configuring virtual machines started to mature and we started to see tools that would configure a virtual machine base disk image using a script or recipe. Tools like Puppet, Chef and Vagrant work on this principal. By storing an identifier for the base machine and the steps needed to build set up the machine for an application in a versionable artifact we were able to get one step closer to the notion of immutable infrastructure. However, there was a problem with this model. There was no way to guarantee that the base machine disk image that the VM was running was consistent across different VMs unless you have defined that with the same toolset. This led to setup scripts failing when switching from an in-house integration server running VMWare and a production server running on the public cloud. There were often minor differences in the configuration of the operating system that would lead to major headaches due to the lack of portability, yet the net benefit of these solutions were such that there was wide-spread adoption.
A Old Solution Re-emerges
At the same time that script-driven configurations were taking off, the market share of platform as a service (PaaS) models on the cloud started to gain market share. Compared to the script-driven configurations and virtual disk images, PaaS promised to simplify the process of knowable infrastructure. You would just write an application and it would run on someone else’s infrastructure. As long as your application conformed to the limitations of the platform, you didn’t need to worry about the underlying infrastructure. You could just move a slider to create more running copies of your program that got auto-added to a load balancer. This was an amazing promise — now we can just outsource our infrastructure. For simple applications, this pattern worked well and still works well. That’s why many startups deploy their applications to Heroku. It saves you a lot of time in the simple use cases. However, once your application starts to demand more from the underlying infrastructure this model can become untenable. Most of the PaaS providers provided a means to customize their base deployment images, but it was often poorly documented, cumbersome and specific to a single vendor.
Recently, containerization has gained traction as a middle of the road solution between IaaS and PaaS. There is an important lesson that is emerging from the Docker implementation of containers. It was a PaaS provider that kickstarted the container revolution by creating Docker. Containerization as a technology has been around for a long time in technology years. From BSD jails (March 2000) to Solaris zones (February 2004) and now to LXC (February 2014) we see an odd degradation of feature sets. With jails and zones being more mature technologies, why did LXC (and thus Docker) make such an impact? One could say it was because Docker is native to Linux or you could say that it is because Docker cared about the developer experience. I would say that the success lies in the developer experience but also in the knowability that it brought to the creation of OS platforms.
With Docker, you have absolute certainty that the underlying image is constant. This is also true for virtual machines, zones and jails. However, the key difference is that every single change from an immutable base abstraction is easily versioned and discoverable. You get a community standardized artifact in the form of a Dockerfile that you can use to trace the build steps to make the platform image. Each build step can be its own image and shared with other developers. This enables another stream to enter into your software development lifecycle (SDLC). Now the infrastructure for your application’s OS can be versioned independently of your application. It can have its own SDLC. It can have its own inheritance model. It can have platform image specialists modify it outside of the scope of the application and all of those modifications are known and recorded.
What Good is a Container if You Don’t Know the Ship it is on?
With containers you can shorten the time of unknowable mutability in infrastructure, but it doesn’t address the problem of the mutability of the underlying host platform. We just have faith that it works and do whatever incantations that we can to ward off the evil spirits from host running the container. However, even though containers are a great step forward towards Immutable Infrastructure, it is only addressing a single computational unit (ie machine) and they don’t address the complex interaction of all dependent systems that are interacting with the application over the network or other IO channels. In other words, versioning containers doesn’t address the versions of everything else that you connect them to, but it is a step in the right direction.
Read full article: medium.com/@elijahz/what-version-is-your-infrastructure