A large part of what makes Ruby on Rails attractive to developers is the sprawling ecosystem of libraries that grew around over the years. There are libraries (called “gems” in the Ruby world) for just about anything: authentication systems, message queues, administrative interfaces to the data model, processing frontend assets, and many more. When tasked with a implementing a new feature or starting a new project, searching for exisiting solutions to the problem and evaluating them is part of being a developer.
Usually a dependency manager keeps track of all such libraries a project accumulates over its lifetime. For Ruby bundler plays this role.Bundler keeps track of the gems used in a project in a file named
Gemfile and pins the exact versions used in a file named
Gemfile.lock. Later on, when the project is used on another machine, running
bundle install looks at
Gemfile.lock and makes sure that all the gems the project needs are installed.
During local development on the developer’s machine, running
bundle install is usually very fast. Even during the initial run, when all of a project’s dependencies need to installed, Bundler can benefit from other projects on the same machine with the same dependencies to avoid unnecessary downloads. When a new gem is added to the project,
bundle install only needs to fetch that new gem and finishes quickly.
However nowadays developers rarely work alone, but in a team, and use Continuous Integration (CI) tools for ensuring a smooth development process across the whole team. Running the project’s test suite or deploying the project through a CI system means that the same code rarely runs twice in the same environment. Since CI tools strive to make the environments in which code runs uniform to rule out hidden errors due to subtle differences in the environment,
bundle install needs to do a lot of work every time: all gems need to be downloaded, built and installed on every run of a task in the CI tool.
Mature Ruby on Rails applications can easily end up depending on 100 gems or more, so the overhead introduced by running
bundle install in a fresh environment, every time, can be on the order of several minutes.
Luckily Bundler comes with a feature for supporting CI tools: running
bundle install --deployment will store all dependencies in a directory
vendor/bundle at the project’s root, next to the
Gemfile. If that directory already exists when running
bundle install, bundler only has to do a minimal amount of work. This is our avenue for improving the performance of tasks in a CI environment.
There are three ways for making use of this feature:
vendor/bundleto the project’s source code repository
vendor/bundleon the machine used by the CI tool
vendor/bundlesomewhere else and download it as part of the task configured in the CI tool
Option 2 can be ruled out immediately, because CI tools usually run tasks in a clean environment, which means that any files we created in the CI environment are lost the next time the CI tool runs a task for us.
vendor/bundle to source control has the lowest maintenance and setup effort in the beginning, but clutters the version history and blows up the size of the code repository.
This leaves us with storing
vendor/bundle somewhere else. Any kind of storage will do, it just needs to be reachable from the CI environment.
Let’s look at some example scripts for Harrow, our CI tool, to see how this kind of caching would be done. First of all, we need one script on Harrow to populate and update the cache and look at how we need to change the script used for running the tests to fetch the cache into the CI environment before running bundle install.
Speeding up tests with a separate Harrow task
Updating the cache
Without further ado, here’s the final script followed by a breakdown of the most significant parts:
#!/bin/bash -e hfold "Installing packages" sudo apt-get update -y sudo apt-get install -y libmagickwand-dev imagemagick rsync hfold --end for key in ~/.ssh/repository-*; do if expr match "$key" ".*\\.pub"; then continue; fi if [ -e "$key" ]; then ssh-add "$key" fi done ssh-keyscan -4 $BUNDLE_CACHE_SERVER > ~/.ssh/known_hosts ( cd ~/repositories/$PROJECT_NAME bundle install --deployment rsync -e "ssh -l $BUNDLE_CACHE_USER" -avz vendor $BUNDLE_CACHE_SERVER:$PROJECT_NAME )
The lines up to
cd are necessary setup for preparing the environment on Harrow. The Ruby on Rails application for which this script is used depends on imagemagick, so we install it here. Then we load any SSH keys necessary for cloning private repositories through bundler. After that,
bundle install --deployment downloads, builds and installs all gems into
vendor/bundle. We then use
rsync to upload this directory to a server (identified by the
BUNDLE_CACHE_SERVER environment variable, which has been configured in the Harrow settings for this project).
This script can be run on Harrow periodically or on every Git commit, to keep the cache up to date. It is very satisfying to see how one task run occasionally can drastically speed up other tasks. Harrows makes this possible by not tying task execution to source code changes, unlike other popular CI systems.
Using the cache
The script for running the tests in this Ruby on Rails application features a similar setup ceremony, so we’ll just look at the part that needed to change:
hfold "Install Bundler" gem install bundler hfold "Load bundler cache" rsync -e "ssh -l $BUNDLE_CACHE_USER" -avz $BUNDLE_CACHE_SERVER:premium-tours/vendor ./ hfold --end hfold "Install Gem Bundle" bundle install --deployment
Load bundler cache block is new in this script and uses
rsync to download the cache that we’ve previously created with the other script outlined in this article.
Looking at the results
This little change was followed by a dramatic performance improvement. Before caching, running the tests took around 8 minutes on average, including the necessary setup of the CI environment. After adding a cache, the runtime dropped down to 4 minutes and 15 seconds on average. That’s an improvement by almost 50%! Including transitive dependencies, bundler needed to fetch 179 gems for this app to run.
Knowing the tools one is using very well makes it easy to find solutions to problems that would otherwise require a lot of effort to implement. Fortunately Harrow is as flexible as our existing tools themselves, so integrating this new solution into Harrow was straightforward.
Let us know if you have discovered similar problems and solutions and how they could work with Harrow! Just send us an email to firstname.lastname@example.org with a link to an article you have written about how to solve a such a problem with Harrow and we can grant your team some extras when using Harrow!
Want to try Harrow.io? Start immediately with a 14 day free trial or learn more
Harrow.io is a platform for Continuous Integration, testing and deployment, built by the team behind Capistrano.
Add powerful automation and collaboration capabilities to your existing tools.
- Automate any software, in any language.
- Create self-documenting, repeatable web-based tasks
- Make them available for your team-mates
- Trigger tasks automatically based on Git changes and webhooks.
- Get notified by email, slack, etc.
- Free for small projects
Test, deploy and collaborate online easily, using tools you already know and love.
Works seamlessly for PHP, Node.js, Ansible, Python, Go, Capistrano and more!
No credit card required – completely free for small or public projects.