Faster Continuous Integration with Bundler and Harrow.io

A large part of what makes Ruby on Rails attractive to developers is the sprawling ecosystem of libraries that grew around over the years. There are libraries (called “gems” in the Ruby world) for just about anything: authentication systems, message queues, administrative interfaces to the data model, processing frontend assets, and many more. When tasked with a implementing a new feature or starting a new project, searching for exisiting solutions to the problem and evaluating them is part of being a developer.

Usually a dependency manager keeps track of all such libraries a project accumulates over its lifetime. For Ruby bundler plays this role.Bundler keeps track of the gems used in a project in a file named Gemfile and pins the exact versions used in a file named Gemfile.lock. Later on, when the project is used on another machine, running bundle install looks at Gemfile.lock and makes sure that all the gems the project needs are installed.

During local development on the developer’s machine, running bundle install is usually very fast. Even during the initial run, when all of a project’s dependencies need to installed, Bundler can benefit from other projects on the same machine with the same dependencies to avoid unnecessary downloads. When a new gem is added to the project, bundle install only needs to fetch that new gem and finishes quickly.

However nowadays developers rarely work alone, but in a team, and use Continuous Integration (CI) tools for ensuring a smooth development process across the whole team. Running the project’s test suite or deploying the project through a CI system means that the same code rarely runs twice in the same environment. Since CI tools strive to make the environments in which code runs uniform to rule out hidden errors due to subtle differences in the environment, bundle install needs to do a lot of work every time: all gems need to be downloaded, built and installed on every run of a task in the CI tool.

Mature Ruby on Rails applications can easily end up depending on 100 gems or more, so the overhead introduced by running bundle install in a fresh environment, every time, can be on the order of several minutes.

Luckily Bundler comes with a feature for supporting CI tools: running bundle install --deployment will store all dependencies in a directory vendor/bundle at the project’s root, next to the Gemfile. If that directory already exists when running bundle install, bundler only has to do a minimal amount of work. This is our avenue for improving the performance of tasks in a CI environment.

There are three ways for making use of this feature:

  1. commit vendor/bundle to the project’s source code repository
  2. store vendor/bundle on the machine used by the CI tool
  3. store vendor/bundle somewhere else and download it as part of the task configured in the CI tool

Option 2 can be ruled out immediately, because CI tools usually run tasks in a clean environment, which means that any files we created in the CI environment are lost the next time the CI tool runs a task for us.

Comming vendor/bundle to source control has the lowest maintenance and setup effort in the beginning, but clutters the version history and blows up the size of the code repository.

This leaves us with storing vendor/bundle somewhere else. Any kind of storage will do, it just needs to be reachable from the CI environment.

Let’s look at some example scripts for Harrow, our CI tool, to see how this kind of caching would be done. First of all, we need one script on Harrow to populate and update the cache and look at how we need to change the script used for running the tests to fetch the cache into the CI environment before running bundle install.


Speeding up tests with a separate Harrow task

Updating the cache

Without further ado, here’s the final script followed by a breakdown of the most significant parts:

#!/bin/bash -e

hfold "Installing packages"
sudo apt-get update -y
sudo apt-get install -y libmagickwand-dev imagemagick rsync
hfold --end

for key in ~/.ssh/repository-*; do
   if expr match "$key" ".*\\.pub"; then continue; fi
   if [ -e "$key" ]; then
       ssh-add "$key"
   fi
done

ssh-keyscan -4 $BUNDLE_CACHE_SERVER > ~/.ssh/known_hosts

(
    cd ~/repositories/$PROJECT_NAME
    bundle install --deployment
    rsync -e "ssh -l $BUNDLE_CACHE_USER" -avz vendor $BUNDLE_CACHE_SERVER:$PROJECT_NAME
)

The lines up to cd are necessary setup for preparing the environment on Harrow. The Ruby on Rails application for which this script is used depends on imagemagick, so we install it here. Then we load any SSH keys necessary for cloning private repositories through bundler. After that, bundle install --deployment downloads, builds and installs all gems into vendor/bundle. We then use rsync to upload this directory to a server (identified by the BUNDLE_CACHE_SERVER environment variable, which has been configured in the Harrow settings for this project).

This script can be run on Harrow periodically or on every Git commit, to keep the cache up to date. It is very satisfying to see how one task run occasionally can drastically speed up other tasks. Harrows makes this possible by not tying task execution to source code changes, unlike other popular CI systems.

Using the cache

The script for running the tests in this Ruby on Rails application features a similar setup ceremony, so we’ll just look at the part that needed to change:

hfold "Install Bundler" gem install bundler

hfold "Load bundler cache"
rsync -e "ssh -l $BUNDLE_CACHE_USER" -avz $BUNDLE_CACHE_SERVER:premium-tours/vendor ./
hfold --end

hfold "Install Gem Bundle" bundle install --deployment

The Load bundler cache block is new in this script and uses rsync to download the cache that we’ve previously created with the other script outlined in this article.

Looking at the results

This little change was followed by a dramatic performance improvement. Before caching, running the tests took around 8 minutes on average, including the necessary setup of the CI environment. After adding a cache, the runtime dropped down to 4 minutes and 15 seconds on average. That’s an improvement by almost 50%! Including transitive dependencies, bundler needed to fetch 179 gems for this app to run.

Knowing the tools one is using very well makes it easy to find solutions to problems that would otherwise require a lot of effort to implement. Fortunately Harrow is as flexible as our existing tools themselves, so integrating this new solution into Harrow was straightforward.

Let us know if you have discovered similar problems and solutions and how they could work with Harrow! Just send us an email to team@harrow.io with a link to an article you have written about how to solve a such a problem with Harrow and we can grant your team some extras when using Harrow!


Want to try Harrow.io? Start immediately with a 14 day free trial or learn more

Harrow.io is a platform for Continuous Integration, testing and deployment, built by the team behind Capistrano.
Add powerful automation and collaboration capabilities to your existing tools.

  • Automate any software, in any language.
  • Create self-documenting, repeatable web-based tasks
  • Make them available for your team-mates
  • Trigger tasks automatically based on Git changes and webhooks.
  • Get notified by email, slack, etc.
  • Free for small projects

Test, deploy and collaborate online easily, using tools you already know and love.
Works seamlessly for PHP, Node.js, Ansible, Python, Go, Capistrano and more!

Learn more Start your 14 day free trial

No credit card required – completely free for small or public projects.

Engineer with a passion for all things unusual, functional and fast. Dario wields tools from TCL to Prolog to get things done in ways that wouldn't occur to most people.

Start Using Harrow Today!

Harrow is flexible, powerful and can make your team much more efficient.

Start free trial