Cedar CI Logo

Journey to Intelligent Cache

(3 min)

Flow of files between jobs in pipeline

As anyone familiar with CI knows, CI is repetitive and so are its problems. The repetitive nature of CI makes it a tempting target for caching. Unfortunately, the common CI caching mechanisms are relatively brute force in nature and do not accommodate the complexity of real-world use-cases.

Overhead

The compress, upload, download, and extract steps, used by almost all caching mechanisms, result in overhead that scales proportionally to cache size. The overhead works against the cache which results in diminishing returns. In many cases, full caching is slower than no cache. Optimizing the performance of a CI job tends to be a balance between theoretical efficiency gains and cache size.

Caches are often used multiple times in a pipeline, such as a dependency cache. The overhead cost of the download and extract steps is paid on each job. It is not uncommon for the whole process to take on the order of minutes. The job that updates the cache pays the cost on both ends (extract and compress).

Any increase in parallelism multiplies the overhead cost in compute time, since it must be performed on each job. The higher the concurrency the larger the percentage of a job becomes just overhead.

[read more]

Cores Aren't Everything

(2 min)

CI Job demonstrating the tendency towards single-core usage

Contrary to the prevailing wisdom, cores aren't everything. For many workloads, peak single-core frequency has the biggest impact on overall performance.

CI jobs tend to be composed of a combination of tooling with glue code holding everything together. Individual tools might take advantage of all available cores, but they tend to taper towards a single-core as they complete their task. Combining multiple tools, back-to-back, with serial scripts and waiting for large artifact transfers can result in something like the example above. Despite 8 cores being available, most of them are barely utilized.

Increasing the CPU frequency would have the biggest impact, even when coupled with a reduction of core count. Gitlab only offers different core counts, all running at mediocre frequencies. By contrast, Cedar CI utilizes purpose built hardware to deliver the fastest core speeds. Our CPUs offer 2-3 times the performance of leading CI providers. Coupled with the fastest cache around, our results are hard to match.

Our cores will end up reducing your CI cost while saving engineer time, which is far more valuable. Realize just how fast your CI can be today by leveraging Cedar CI.

Simple As π

(2 min)

Since Cedar CI provides twice as many cores, as Gitlab.com, for a given rate, we decided to demonstrate the performance difference to emphasize the value. For a simple and direct comparison we created a CI job to calculate π to the first 4,000 digits, repeated 32 times. Each calculation is enough to saturate a CPU core.

We ran the job on Gitlab.com's saas-linux-small-amd64 and Cedar CI's 4-core cedarci-4-core since they're the same price. The job definition is as follows.

[read more]

Bespoke CI Container

(3 min)

final pipeline with bespoke CI image build

While applications may start with a language or application specific base container it does not take long before additional distro packages are required. The packages may be needed by the application itself or its support tooling. Many times this begins by installing something like curl and grows over time. The package installation is a source of intermittent network failures and adds overhead to each job.

A general trick to improve performance is to avoid doing things. Creating a bespoke container for CI is a common pattern, but can create a lot of friction and is tricky to get right. The following is an approach for creating a conditional CI container build that avoids unnecessary execution while also automatically incorporating updates.

[read more]

The Value of Fast CI

(5 min)

xkcd: compiling (303)

Automation via CI/CD primarily minimizes mistakes and increases efficiency. Instead of remembering to run various test suites and build steps, and having to deploy to a review environment, engineers can simply push changes and wait for CI to complete. The larger the coverage area of CI, the higher the potential value.

However, the more that is added to CI, the more compute time is required, and the more time an engineer traditionally has to wait for the result. In some cases, an engineer can avoid waiting for CI, but this generally means context switching, which can have its own impact on overall productivity. For this reason, wait time directly correlates to engineer time and causes fatigue from repeatedly paused development cycles.

As the number of CI operations increases, so too does the likelihood of transient failures. Depending on the configuration, failures may be automatically retried or may require manual intervention. Each failure increases the overall duration. A transient network failure that is retried locally may only add a few seconds, but a flaky test run as part of a suite may require a rerun of an expensive test job. It isn't unrealistic for several failures to result in a doubling of the overall duration.

[read more]