Journey to Intelligent Cache
(3 min)
As anyone familiar with CI knows, CI is repetitive and so are its problems. The repetitive nature of CI makes it a tempting target for caching. Unfortunately, the common CI caching mechanisms are relatively brute force in nature and do not accommodate the complexity of real-world use-cases.
Overhead#
The compress, upload, download, and extract steps, used by almost all caching mechanisms, result in overhead that scales proportionally to cache size. The overhead works against the cache which results in diminishing returns. In many cases, full caching is slower than no cache. Optimizing the performance of a CI job tends to be a balance between theoretical efficiency gains and cache size.
Caches are often used multiple times in a pipeline, such as a dependency cache. The overhead cost of the download and extract steps is paid on each job. It is not uncommon for the whole process to take on the order of minutes. The job that updates the cache pays the cost on both ends (extract and compress).
Any increase in parallelism multiplies the overhead cost in compute time, since it must be performed on each job. The higher the concurrency the larger the percentage of a job becomes just overhead.