Cedar CI Logo

Journey to Intelligent Cache

(3 min)

Flow of files between jobs in pipeline

As anyone familiar with CI knows, CI is repetitive and so are its problems. The repetitive nature of CI makes it a tempting target for caching. Unfortunately, the common CI caching mechanisms are relatively brute force in nature and do not accommodate the complexity of real-world use-cases.

Overhead

The compress, upload, download, and extract steps, used by almost all caching mechanisms, result in overhead that scales proportionally to cache size. The overhead works against the cache which results in diminishing returns. In many cases, full caching is slower than no cache. Optimizing the performance of a CI job tends to be a balance between theoretical efficiency gains and cache size.

Caches are often used multiple times in a pipeline, such as a dependency cache. The overhead cost of the download and extract steps is paid on each job. It is not uncommon for the whole process to take on the order of minutes. The job that updates the cache pays the cost on both ends (extract and compress).

Any increase in parallelism multiplies the overhead cost in compute time, since it must be performed on each job. The higher the concurrency the larger the percentage of a job becomes just overhead.

Pie chart illustrating cache & build overhead for 3X concurrency Pie chart illustrating cache & build overhead for 6X concurrency

Keys

Besides the size of the cache, the lookup mechanisms are generally either fixed keys (ex. branch name) or a hash of important files (ex. dependency lock). For many cases it can be difficult to assemble an optimal key.

Various fallback mechanism are provided to handle cache misses, but ultimately add complexity and fail to provide an optimal solution.

Security

There are two security concerns pertaining to caches:

  • a non-protected branch polluting a cache used by a protected branch
  • protected branch secrets being exposed to non-protected branch pipelines

Imagine publishing corrupted artifacts in a release branch due to a polluted cache from a feature branch. Such a shared cache usurps maintainer review by effectively modifying CI without merging. The removal of write access from non-protected branches solves the problem, but most implementations use a complete bifurcation with negative performance implications.

The second issue is less common and arguably an improper use of a cache. Mistakes do happen, thus avoiding this possibility by default makes sense.

Intelligent Cache

Considering the short-comings of existing caches, we built a CI cache from the ground up. The following is a summary of what we managed to achieve:

  • no overhead
  • no keys
  • no paths
  • pragmatic security

Our Intelligent Cache is coupled with purpose built hardware to provide results that are nearly impossible to match. 2-10X faster* and more reliable CI by caching everything.

See the results for yourself.

*Not guaranteed, but a common result.