Notes - Grokking Continuous Delivery

This post is my summary from the book Grokking Continuous Delivery.

Continuous integration is the process of combining code changes frequently, with each change verified on check-in.

Continuous delivery is the collection of processes that we need to have in place to ensure that multiple software engineers, writing professional-quality software, can create software that does what they want.

It aims

  • Release changes at any time.
  • Delivering the software as simple as pushing a button.

Phases

  1. Lint
  2. Test
  3. Build (could be creating an image)
  4. Publish (could be upload image to image registry)
  5. Release (update running service to use the new image)

To really be sure you can safely deliver changes to your software, you need to be accumulating and verifying changes to all the plain-text data that makes up your software, including the configuration. aka config as code

Lint

Linters give you bugs, errors in your code. It could be things like unused variable, formatting error or any bug that prevents from compiling.

If it’s a legacy project that has thousands of errors, you can’t tackle them at once. In that case;

  1. Prevent adding more errors by measuring. If new prs increase error count, do not accept them.
  2. If some parts of project doesn’t updated recently, and just works fine, you may just exclude them from linter. On the other hand, if the project is small or new, you can tackle every linting error in a short period of time and don’t accept any prs with linting error.

Testing

Signal is the test result that gives you information. Noise is the distraction that hides information. For example, if tests pass but there is still error, it’s noise, otherwise signal. If tests fail, it gives you information therefore signal, but if you ignore it then noise. Our goal is to avoid noises and turn them into signals.

Do not move forward in the project until tests give you green lights. Ignoring errors will make them noise that you don’t care much and it may bite you later.

Treat every test failure as a bug and investigate it fully.

Whenever a test fails, then it means there is a mismatch between the test and actual code. Do not change the test for the sake of pass without understanding the problem. Also if the test is flanking(sometimes passes, sometimes don’t), you should take it serious it as well.

If you care about test coverage, you can also include measuring it to pipeline as well. For instance, you can reject new merges if it decreases test coverage.

If your pipeline is slow, especially due to testing time;

  1. Decrease testing time, for instance using more unit tests and less integration test.
  2. Running tests in parallel or even by sharding(running tests in multiple machines)

You can also consider having multiple pipelines (+ means in parallel)

  1. After pr made: lint -> unit tests + test coverage -> integration tests + end-to-end tests
  2. During publish/release: lint -> unit tests + test coverage -> integration tests + end-to-end tests -> build and upload image -> release

Rare Problems

Divergence from and integration with the main branch

Let’s say two people created a pr, both of their test’s passes, and their prs are being merged about the same time. However, there’s a conflict that breaks our pipeline. We can catch the conflict if they have changed the same lines but in this case they changed different lines so only catch the problem after merging both of them with main. We can solve this problem in 3 ways.

  1. Scheduled pipelines: In this case we can only know this problem after they’re merged. Plus who is going to fix the problem after it has been noticed? We need to allocate a person for this. If this kind of problems appear a lot, it can affect the personal’s morale.
  2. Require it to be in sync with main: After first PR merged with main, you gotta block other PRs until they’re in sync with main. It could be discouraging if there are a lot of developers, like one PR block other 20 PRs, so developers will hurry to merge. However, it might be fine when there’s only a few developers.
  3. Merge queue: The CI CD system queues the PRs, then manage them one by one. It creates a new branch from main, then merges it with our PR, then run the pipeline on it, then merges it. Then it does the same thing with the next PR. It works better for big teams but can be slower.

Flake tests(aka only appears from time to time)

Some tests may only fail sometime, but pass in other times, so it’s easy to ignore them. You can find them with periodic tests, and focus them instead of ignoring.

System and library differences

The difference between environments(local dev-test-prod) could create some problems. Also library versions can create new bugs as well. The book suggest to make CI pipeline and release pipeline similar, so unexpected errors won’t sneak in. However, this part was a little abstract. It describes the pipeline like; run tests -> build and deploy image + set up the environment -> then run system tests.

DORA Metrics

Velocity

  • Deployment frequency: how often the organization deploy to the production.
  • Lead time for changes: how long it takes a commit to reach production. Stability
  • Time to restore service: how long it takes to recover from a failure in production.
  • Change failure rate: percentage of deployments that cause failures.
Metric Elite High Medium Low
Deployment frequency Multiple times a day Once per week to once per month Once per month to once every six months Fewer than once every six months
Lead time for changes Less than an hour One day to one week One month to six months More than six months
Time to restore service Less than one hour Less than one day One day to one week More than six months
Change failure rate 0–15% 16–30% 16–30% 16–30%

The book increases velocity by merging commits quickly, even when they’re not finished. You can hide them behind of feature flags, so the customer won’t be affected, and also skip the tests for a temporary of time until development is done so it will pass the CI CD. Small, contained PRs make the reviewer’s task is easier, plus other developers will see the changed code faster, so they can work together by implementing different parts of the same feature easier.

Use versioning, so you won’t override previous versions. You can just rollback to previous version if there’s a problem or if backend and frontend is separated, frontend can adjust itself based on backend version. Also include release notes for every new version. Versioning convention is Major.Minor.Patch. Major is for big, backward-incompatible changes. Minor is for new backward-compatible features. Patch is for backward-compatible fixes.

Let’s say we had a failure in the new deployment. We can quickly rollback to previous version, so customers won’t be affected anymore. Then after fixing it, we can start using the current version. In this way, time to restore service will be on elite level, which is just rollback time.

Have a ready rollback strategy, it should be automated, documented and tested, so there’ll be no surprises.

You can use blue-green deployment for rollbacks. It means, you create new instances of your your app with the new release, but not removing the previous ones. Then after newer ones are ready, move the traffic on them. If there’s a problem, then turn the traffic to previous ones. So in this way you’ll decrease rollback time.

Canary deployment can be used to minimize affected users from the outage. You create an instance, then forward only a small amount of users to that instance. If you see no problem, then you can create other instances and move every user to the new instances. If there are some problems, you can just forward everyone back to previous instances.

During canary deployment, you’ll check metrics if there’s any issue. But since it’s a new instance, metric results might be different than previous instances. To solve this issue, you can create one instance from previous version, and one instance from new version, then forward a small percentage of users to them, then compare the metrics between them to see if there’s any issue.

Do you need CD ?

CD simple means, working software is released to users automatically on every commit. It would change deployment frequency and lead time for changes to elite level. If your code stays in releasable state and releasing is as easy as pushing a button(by automation), you’re ready for CD.

However, it’s obviously not for everyone. It increases possibility of errors in production, you might need explanatory testing(QA) before releasing or approval from someone. If error cost is too high such as financial or something about risk to human life, you may wanna be more careful.

Mandatory QA exists because people afraid of failure. However, if you can make the impact or errors small with safe rollback strategies and able to accept some errors, QA is not necessary.

With CD, you can just automatically revert changes after the outage. So your codebase can stay in releasable state.

Building CI CD for greenfield project

You start by adding building task. Then linting, unit tests, then add publishing to image registry after building the image. Eventually you include integration and end-to-end tests. End-to-end ones is is the final task, it tests on a running instance.

CI Pipeline(triggered when PR is created or updated, or through merge queue or periodically) Lint + unit tests -> integration tests -> build image + (optionally set up environment) -> publish image -> run image as container -> end-to-end tests Release Pipeline Build image -> publish to image registry -> Deploy

Building CI CD for legacy project

In this case, you can ignore some part of projects that doesn’t change often, so you’ll have less to take care of. You can less focus on unit tests and more on integration and end-to-end tests. You can measure test coverage and make sure it doesn’t go down. Linting can be an afterthought. Your first pipelines might be imperfect, but there’s always room for improvement in pipelines.

Your pipeline tasks need to be cohesive(do one thing well), do just enough and composable(reusable, can be composed with other tasks)

You may see scripts written in such as bash in your pipeline. If there’s some complexity, it will be hard to maintain them. In that case, you better write them in a general-purpose programming language such as Python, and then call them from your script through bash and others. You can keep your tasks tested, versioned, easy to read and debug, and share them between different pipelines.

In Github Actions, pipeline is workflow, task is job and action.