Cedar CI Logo

The Value of Fast CI

(5 min)

xkcd: compiling (303)

Automation via CI/CD primarily minimizes mistakes and increases efficiency. Instead of remembering to run various test suites and build steps, and having to deploy to a review environment, engineers can simply push changes and wait for CI to complete. The larger the coverage area of CI, the higher the potential value.

However, the more that is added to CI, the more compute time is required, and the more time an engineer traditionally has to wait for the result. In some cases, an engineer can avoid waiting for CI, but this generally means context switching, which can have its own impact on overall productivity. For this reason, wait time directly correlates to engineer time and causes fatigue from repeatedly paused development cycles.

As the number of CI operations increases, so too does the likelihood of transient failures. Depending on the configuration, failures may be automatically retried or may require manual intervention. Each failure increases the overall duration. A transient network failure that is retried locally may only add a few seconds, but a flaky test run as part of a suite may require a rerun of an expensive test job. It isn't unrealistic for several failures to result in a doubling of the overall duration.

Quantifying Cost

Understanding the costs associated with CI can help highlight the importance of changes aimed at reducing cost. The most prominent costs associated with CI are as follows.

CategoryDescription
DelayTime not spent actively developing
Pause FatigueHolding context while waiting for results
Flaky BurdenDuration extension and fatigue from manual retry
ComputeHardware consumed

The first three costs directly impact an individual's time, which is typically 1-2 orders of magnitude more expensive than the fourth cost: compute. Depending on the scenario, upwards of 3 people could be waiting on CI to complete before being able to exchange incremental feedback on a new feature or production fix. Today's CI is at the core of most engineering workflows.

To get a feel for the relative value, a Gitlab.com CI minute is two orders of magnitude less expensive than a single engineer working at $60/hour, not even factoring in additional payroll/HR-related costs per person. This means the cost of a CI minute for a single, active job with a single engineer waiting on the result is as follows.

1 min * $0.01 / CI min  = $0.01
1 min * $1.00 / Eng min = $1.00
-------------------------------
Total                   = $1.01

To say nothing of fatigue or organizational efficiency losses.

Example

Consider the following basic CI pipeline for a single project repository with chunked test jobs that run concurrently.

Basic CI Pipeline

The number of active jobs plotted by minute would look something like the following.

Basic CI Pipeline Plot

Instead of running the pipeline on Gitlab.com, let's use Cedar CI, which is twice as fast. If the test concurrency was increased and the deploy job ran with the tests, we can conservatively expect a 65% time savings.

Comparing the increased CI load to our single-engineer expense from before illustrates just how miniscule the CI cost is relative to that of an engineer.

Basic CI pipeline

Considering Cedar CI's machines are twice as fast as Gitlab.com's for the same rate, the base CI minute cost remains unchanged.

Original Cost

Running the original pipeline on Gitlab.com runners.

33 CI min  * $0.01/min = $ 0.33
20 Eng min * $1.00/min = $20.00
-------------------------------
Total                  = $20.33

Cedar CI Cost

Running the original pipeline on Cedar CI runners.

18 CI min  * $0.01/min = $ 0.18
13 Eng min * $1.00/min = $13.00
-------------------------------
Total                  = $13.18

Compressed Cedar CI Cost

Running the compressed pipeline on Cedar CI runners.

30 CI min  * $0.01/min = $ 0.30
10 Eng min * $1.00/min = $10.00
-------------------------------
Total                  = $10.33

Savings

Even at the same CI minute rate, the net cost with Cedar CI is less, since fewer minutes are spent and engineer time is saved.

Taken it a step further, Cedar CI's compressed pipeline is closer to the original CI minutes. Accounting for the overhead from concurrency, the net savings is tremendous. If a more efficient conversion is achieved, the number of CI minutes spent could even be close to half. However, the biggest impact on cost is still the time savings on the part of the engineer.

These examples consider a single engineer waiting on CI. Obviously, the more individuals waiting, the greater the cost savings.

Optimization

To realize even greater cost savings, additional compute resources can be used to save even more of the engineer's time. Reducing flakiness through caching and internal proxies is often more than worth the cost; a single failed job in our example could increase the overall duration by 30%, and in real-world scenarios, it's not unheard of to see a job fail multiple times in a row.

Imagine if the build job could saturate more than 4 cores. Upgrading to a 16 or even 32-core machine could reduce the build time by a similar amount (though with diminishing returns). If a 32-core machine was used in place of a 4-core machine, and the build job completed 6 times as fast (an 80% efficiency improvement compared to 8 times the number of cores), the following calculations express the change.

compressed duration = 3 min
30 - 3 + (3 * 8)    = 51  CI min
10 - 3 + (3 / 6)    = 7.5 Eng min

The cost for the optimized pipeline would follow.

51  CI min  * $0.01/min = $0.51
7.5 Eng min * $1.00/min = $7.50
-------------------------------
Total                   = $8.01

Comparing the optimized cost against the compressed pipeline, another ~20% savings is achieved by exchanging 70% more compute resources for engineer time.

Strategy

It's simple: no one likes waiting longer than they have to, let alone engineers when they're in the middle of development. Cedar CI's goal is to achieve peak efficiency through novel solutions that improve performance and reliability. Those solutions bring about a reduction in absolute CI cost and further optimizations in net value.