Our take on devops, part 1: The history lesson
Devops is nothing if not a buzzword, but what exactly does it entail, and where does it come from? With this post, we kick off a series on our take on devops!
In the past couple of years, devops has become something of a buzzword: it seems like everyone is talking about it and has an opinion on it. Now of course, the devops philosophy or culture promises that building, testing, and releasing software can happen rapidly, frequently, and more reliably – so what’s not to love? It certainly sounds like music to our ears!
At November Five, we’ve developed our own take on what devops looks like, and we’re kicking off a series of blog posts this year to show you how we embrace this cultural change.
And as always, understanding what we do is only possible by looking at where we came from. That’s why we kick off this series with a short history lesson.
Back in time: Software Development
Let’s take our time machine back to the late nineties. Google barely existed, Netscape Navigator was still the most popular browser and less than 5% of the world population had access to the internet (45% in 2016). The main way to release software was through the distribution of CD-ROMs.
Software development mainly still followed the Waterfall model. A sequential series of steps took the project through the phases of requirements, design, implementation, testing, release and maintenance, leaving no room for iteration or change of requirements.
The main problem with that approach is the long lead time before gathering feedback from the client, who – clients, please close your ears – often don’t know what they want before they see it. This often results in changing requirements, useless development and a lot of wasted time and money.
Also, development is always hard to estimate, especially for large projects. It’s like having to estimate how many days it will take to solve a thousand different rubix cubes or escape a thousand mazes, all of different sizes and shapes. In order to make an accurate estimate, you’ll have to analyze each of the rubix cubes or mazes in depth, and even then you will probably oversee hidden complexities or shortcuts you can take. The larger the project, the more likely it becomes that you bump into hidden difficulties that make you fail the deadline. There’s quite a lot of math and research backing this problem (two excellent articles here and here).
In 2001, a countermovement emerged, the now widespread Agile philosophy.
A bunch of people working in IT popularized the concept of ‘Agile Software Development’ by writing a manifesto. In it, they tried to overcome the shortcomings of the waterfall model and associates, by appreciating individuals and interactions, working software, customer collaboration and response to change over processes and tools, comprehensive documentation, contract negotiation and following a plan.
The Agile software movement was reflected in a set of project management/process methodologies such as Scrum, Kanban, Lean or Extreme Programming (XP), which all attempt to iterate quickly in short development cycles (called “Sprints” in Scrum-lingo), create visibility on the status and allow for changing requirements after each cycle.
A little side note: we wouldn’t be November Five if we’d always play entirely by the rules, but we do apply a management process that looks a lot like Scrum with sprints, backlogs, user stories, standups, demos and retrospectives.
This means every sprint might consist of each of the phases of the original waterfall model, and can (or should) result in a releasable product. At the root, agile software development is about getting working software in the hands of the user, as fast as possible. This means that all phases have to be executed a lot of times, including the delivery-related ones (testing - release - maintenance), and guess what: that’s where devops finds its origin.
More history: delivery management
Again, search for your baggy pants, put your cap on backwards, and rewind to the nineties!
A traditional (did anyone say old-fashioned?) IT environment, operating in a waterfall process, was made up of different ‘silos’ of developers, testers, PMs, release managers and system administrators. When releasing a new piece of software, the development team had to finish up their features and throw it over the big ‘QA’ wall, which in turn passed on the hot potato to the operations team for release and maintenance.
In this methodology, the final product was a shared responsibility of many people, each reporting to different managers and optimizing for their own goals as applications move through departments.
And when shit hits the fan, it’s always someone else’s fault.
A few examples of things that might go wrong:
- Software that worked flawlessly in a development environment, doesn’t function properly once in production (the infamous ‘works-on-my-computer’ phenomenon).
- Release cycles are very long and it takes a long time before software actually ends up in the hands of the users.
- Software only provides business value when it gets shipped; unreleased software has no value whatsoever.
- Releases pile up because of fear of change, because changing something in a large project with a long release cycle might break things that are very hard to fix.
- In the end, way too many man-hours are spent inefficiently on knowledge transfer between teams.
When you’re trying to follow an agile methodology without adapting your delivery process, you’re ending up in a so-called water-scrum-fall methodology.
In order to increase the number of releases and reducing the fear and risk of releasing software and, in the meantime, fix the ‘works on my computer’ problem, you’ll have to remove some walls (sorry, mr. Trump).
In 2008, at an Agile conference, the term devops was introduced while discussing agile infrastructure and operations.
The idea behind the devops culture is “you build it, you run it.” By getting rid of the silos and walls, a single project team becomes responsible for the final product. This includes software development, but also infrastructure, installation, testing, packaging, release, configuration, monitoring and bug fixing.
A couple of things happen if you make the product development team responsible for production releases:
- “Run what you build” forces development teams to think about how their software is going to run in production as they design it (think about scalability, caching etc during development).
- It encourages ownership and accountability; if you make a mistake, you know you’ll be responsible for fixing it later on.
- It creates transparency: finding problems before they happen, by coding differently (logging, monitoring, …)
- Developers hate repetitive tasks, so devops creates a great environment for automation!
The overall effect: an increase in quality. Awesome! But it’s important to note that devops – as you can see – is a serious cultural shift. And in many cases, an entire organisation needs to be restructured to reach devops’ goals.
Can’t live without: tooling
If a team is to adopt the end-to-end process of delivering their own software, they need a set of tools to accommodate the need for automation on all levels. While there were plenty of tools available for building software, there used to be a gap in the tooling that eased setting up infrastructure, configuring servers and deploying code. Around 2005, a couple of technologies emerged in this space that accelerated the adoption of the “devops” culture. With a combination of some of the many tools mentioned in this section, you have all the building blocks available to set up a completely automated delivery pipeline.
We’re not attempting to be complete here (that’s what Wikipedia is for); we just want to give you an overview of the most popular tools.
The first event to shake the tech industry was initiated by the ecommerce giant Amazon, which launched the Elastic Compute Cloud (EC2) in 2006. EC2 is an online service that allows you to launch and manage virtual servers fully automatically. Launching a server takes only a click of a button or an API call and a few minutes of patience. Billing happens based on usage with a price per hour, which eliminates capital expenses. EC2 (and a lot of other Amazon Web Services) satisfied the need to shorten lead times and facilitated capacity planning problems – which were a major headache for anyone trying to speed up software delivery.
In 2016, the revenue for AWS increased to over 11 billion dollars, and it is rumoured that over 50% of all internet traffic originates from or passes through one of the many AWS data centers.
Other big players in the so-called ‘infrastructure-as-a-service’ market are Google Cloud Platform and Microsoft Azure.
Configuration management and orchestration tools
Installing or upgrading a package on a server is a task that’s almost as old as development. When you’re actually dealing with a package, you can just SSH into the server and type
sudo <apt-get|yum|brew> install nginx as quickly as you’d grab another coffee. But if you want to manage dozens of packages and config files on a plurality of servers with different roles, you need an army of system administrators (and a lot of coffee) to accomplish this in a complete, secure and up-to-date environment.
Or, more likely, you need a configuration management tool.
The pioneer in this genre is CFEngine (first released in 1994), but configuration management only gained popularity with the initial release of open-source tool Puppet in 2005; later followed by Chef (2009), Salt (2011) and Ansible (2012). These tools differ in whether or not they use ‘agents’ (daemons running on the servers to be provisioned), the supported operating systems (although all support Linux) and configuration file language (ruby, yaml, json, …).
If three people are writing a hypothetical blog post about the history of devops, and – after an initial vague agreement about who does what – wait until the end to merge their pieces of text together as a whole, it will take a lot of extra work to make that blog post look and feel like a coherent whole. The same goes for code.
The initial idea behind continuous integration (CI) is for one or more developers to merge all changes into a shared trunk multiple times a day, thereby avoiding ‘integration hell’.
On every merge, a build server takes the role of building the software continuously and verifying that the quality of the software is not degraded by running the automated tests as well as static code analysis tools. Integrating code changes multiple times a day is incorporated in some of the agile methodologies described above.
Even though the concept of continuous integration has existed since 1991, it took until the early nillies for Cruisecontrol and Hudson to become truly popular as tooling. Right now, an enormous variety of build servers – self hosted and in the cloud, open source and commercial – exist, with Jenkins (which was forked from Hudson in 2011) as the most popular one.
When the responsibilities regarding delivery are pushed increasingly onto product teams, more issues pop up. Often, different components (or ‘microservices’, if you want to play buzzword bingo) are developed and maintained by different product teams. Even with something like IaaS available, it’s not efficient for each product team to reinvent the wheel and set up and maintain infrastructure individually.
On the other hand, running software from different product owners on the same infrastructure causes problems with conflicting dependencies, runtimes and high load issues. Oh, and we still haven’t found a solution for that ‘runs on my machine’ problem we mentioned earlier.
In 2013, a French company open-sourced Docker. Docker provides a system to create and run ‘containers’, which wrap your product together with its dependencies (its runtime, system tools and libraries), all in the exact version your piece of software needs. This ensures that the product will always run the same, regardless of its environment.
Containers running on a machine all share the same kernel, making them much faster to boot, cheaper (in terms of memory) to run, and smaller (in size) to distribute. Although Docker is still relatively young, the ecosystem around it has exploded with tools for orchestration and scheduling and service discovery (Kubernetes, Mesos, Marathon, Docker Swarm) as well as support for running containers in the cloud (AWS ECS, Google Container Engine).
Back to the future: devops at November Five
For a startup, adopting a devops culture is relatively easy: the speed of deployment enables a startup to iterate and learn fast in a fully-controlled environment with barely any legacy sitting in the way, and there’s often no money for a dedicated ops team anyway.
For a large enterprise, with extensively defined quality control and risk management processes, it’s a different story. A large company can’t afford a newbie releasing a security leak into their core software on day one of the job, and there’s still a lot of distrust in public cloud providers. As a result, they often end up in the water-scrum-fall methodology, in which development teams attempt to follow an agile-like methodology for developing software, but end up throwing software over walls once it’s ready for production.
At November Five, we’re in between. With around 60 employees, we’re no longer a startup. We’re working for boths small companies and large enterprises, and when we’re responsible for shaping, building and sustaining digital products and services for the latter, we’re often confronted with the difficulties delivery entails in larger organizations.
That’s the raison d’être of our team, Product Operations: we create the conditions in which the product teams can achieve their full potential, by streamlining and automating processes and workflows.
A product is never done; improving in iterations is in the DNA of November Five. The speed at which you can iterate is only as fast as the slowest link in the chain, and unfortunately, that’s often delivery. In order to speed up delivery while maintaining the highest quality for our products, we invest heavily in defining the delivery process and automating wherever possible.
So what does our delivery process look like?
One of the main objectives of team product operations is to consistently deliver high-quality products. The way we deliver products has transitioned over the years from a classic delivery model to a model where we deliver continuously.
Let’s see what a classic delivery model looks like. When we talk to our clients they all have the same requirements, and they definitely make sense:
- they expect us to build some awesome features,
- they expect the features to be build with the highest quality,
- and they expect we deliver the features on time.
Let’s say we split the delivery process in four major phases, so we can better deal with these requirements. In a classic delivery model these phases would look like this:
- Optimistically define user stories and acceptance criteria
- Furiously design and implement all user stories
- Anxiously test and fix all bugs
- Release the product when all lights are green
You can see that the main focus is the features, which gives us some complications.
- All features have priority number one. We can only release the product when all features are marked as done, so we can’t prioritise the backlog. All features are equally important.
- The QA feedback loop takes longer than expected. The test team can only start when the developers are ready with their work. If the developers go overtime, testers need to work double shifts to finish their work. Testers might need to adjust their test cases during this phase and retest all features. Developers might need to be reallocated from other projects to fix major bugs.
- The release gets postponed. With unfinished features or too many major bugs, we have no other option than to postpone the release. The project team members now have to work extra hours which has an impact on quality and overall motivation.
- Stakeholders are not aligned. By postponing a release the stakeholders are not aligned anymore. Marketing has to hold back their campaign. Sales has to inform their prospects and clients of the delay. This results in more stress and anxiety to release.
Features, quality and timeliness are three requirements that are too difficult to focus on at the same time. I think we can all agree that quality is too important to drop. Nobody wants to ship buggy products.
With feature-based releases, the first thing that gets dropped when the features are not ready is the release date. With time-based releases, on the other hand, we drop the features that are not ready in order to release on time.
By introducing recurring time-based releases, we enter the magical world of continuous delivery.
Continuous delivery wipes out all the complications of a classic delivery model. It’s an approach where software is produced in short cycles, by using recurring time-based releases and investing in automation. By using continuous delivery, we can release more quickly and efficiently to the market. By knowing the release dates upfront we can better align stakeholders. The marketing department can plan their campaigns around the release dates and the sales team can adjust their sales pitch without making any false promises.
The goal of continuous delivery is to switch the focus from features to quality. By incrementally delivering new features we keep the business value high and reduce the risk of releasing. Feature are only useful if they are of high quality, so we drop features that do not meet the quality standards and prepare them for a next release.
With continuous delivery we want to decrease the cycle time and increase quality. The longer you wait with the release of a new version, the more your business value decreases over time and the higher the risk is of releasing a new version. By releasing more frequently we can gather feedback on a regular basis and improve our upcoming releases.
Recurring releases do have a lot of recurring tasks, which means it’s important to invest in automation. By automating delivery tasks, we save a lot of repetitive work and reduce human error at the same time.
With continuous delivery, the definition of “done” shifts. Done no longer means the developer has finished his part of the job. Done now means that the software is thoroughly validated as a release candidate. As a result, releasing software is now considered to be a business decision.
Let’s be honest, we’re not there yet. We’re working for big enterprises with sometimes rigid procedures. We’re continuously striving to optimize our process, and by applying some of the continuous delivery principles we’re getting closer to our goal: a fully automated continuous delivery pipeline.
Today, we still maintain (external) verification and release procedures at the end of our software development cycle for the customers who want it. But in the meantime, our testing efforts run in parallel with our design and implementation tracks. The advantage to this is that we now have intermediate builds that are of a higher quality. At the end of each sprint we have an internal release that is thoroughly validated as a potential release candidate.
Together with the automation efforts discussed in the next section, products are delivered by November Five to -at least- our clients in a continuous way. It’s up to them to decide when and how to release and what QA procedures to add on top, and in the meantime get convinced of the added value of this improved way of working.
Leave the monkey jobs to the monkeys
To be honest, we’re not that fond of rules. They curtail creativity, and following a process often sucks precious time into non-creative and repetitive tasks. However, rules, in the form of a well-defined methodology, are essential if you want to deliver the highest quality to your customers in a consistent manner.
In our efforts to find a balance between creativity and methodology, we decided not to overload our team members with a pile of procedures. Instead, we invest heavily in automating literally anything that can be automated – in such a way that the November Five procedures that guarantee the quality of our products and deliverables are captured in code.
This has resulted in a set of tools – some only a few lines of code in a bash script, others with multiple components and web interfaces – that have all have become a part of the day-to-day operations of November Five. They’re all named after monkey species, because, well, you should leave the monkey jobs to the monkeys. They can’t wait to get to know you!
When you’re talking about Infrastructure as Code, we’re talking about Saki! Saki is a wrapper around Ansible, AWS Cloudformation and Packer that allows us to automate the creation and provisioning of our web stacks on AWS. Based on a single config file, we’re able to set up a resilient and highly available set of web servers, a deploy server running Jenkins and a whole bunch of other tools and services available (think Elasticsearch, MySQL, SQS, Memcached, New Relic etc.). Deploying a web project without downtime or risk has become as easy as pushing to the right git branch, with all other difficulties (backups, log management, monitoring, resistance to failures) handled without a single human involved.
The smallest monkey in the world is also our tool for managing the build process for anything that has to do with native code. It generates Jenkins job configurations based on a config file (in YAML format) that is contained in each project’s repository. This config file resides in the root of the repository of each project and is maintained by each developer. This way, job configs can be easily managed, updated, versioned and maintained, and changes to the build or test process can be introduced uniformly across all projects without touching each code base.
Another monkey, working closely together with both Saki and Pygmy, is Capuchin, the checklist monkey. We’re using a set of existing code inspection tools that allow us to check for bad code smells (think Faux Pas for iOS or Lint for Android), but these tools only check for general and well-known issues. Capuchin complements these tools and checks for November Five-specific rules (such as ‘all debugging output should be disabled in production builds’ or ‘November Five licensing file should be present in every repository’. It is integrated in Jenkins, and will run and generate a report on every build.
November Five is built upon a set of (mostly) cloud-based tools, including Toggl for time tracking, Bitbucket for source code version control, JIRA for bug tracking, Confluence for documentation, Slack for team communication, G Suite for mail, calendar and document sharing, Teamleader for CRM and invoicing, Dropbox for file storage, … The list goes on.
In these services, the same concepts tend to recur: a product we’re developing for a client, for example, needs a channel in Slack, a git repository per platform, folders in Google Drive and Dropbox and a Jira/Confluence project for issue tracking and documentation. Same goes for clients, product versions, team members, statements of work, invoices etc.
Not all of these tools support single sign-on with fine-grained access management, which results in a proliferation of accounts. It also creates a little piece of hell when onboarding or offboarding an employee. That mess gets even worse if you want to grant someone access to a specific product or want to rename a product.
Baboon uses the APIs of all of these tools to glue together our specific rule sets. Creating clients, products, SOWs, versions as well as access management and reporting, it’s all done in a slick web interface. When a new employee starts, Baboon takes care of setting up accounts for each of these services, making sure that naming conventions as well as internal security rules are followed.
So, now that you’ve met our most productive employees (they work 24⁄7!), you’re keen on getting to know each other better, no? I thought so! In the next months, we’ll publish a series of follow-up blog posts that dive deeper into the functioning of each of these monkeys.
And if you want to watch them up close in their natural habitat, check out our jobs page! As it happens, team Product Ops is hiring right now…