Using Android’s Repo tool to manage complex software projects.

Harrow is a relatively complex software project, comprising non-trivial software in many distinct languages spread across many repositories. We’ve always coped with this complexity fairly well, most of our team have a $HOME/code/harrow directory they work in, and most of them only need to check out two or perhaps three repositories at any one time. After a while on the project everyone seems to have everything they need and we’ve never really had to formalize a setup process for our team. Some of our Git repositories have inter-dependencies which have managed themselves, For example our frontend.git repository has a strict dependency on our style-guide.git repository as a Bower module. Our api.git uses git-subtree to pull in all of the code which is used (and shared) with our Puppet manifests in puppet.git. With those dependencies in place if you only need a working, up-to-date front end Bower will grab the style guide on your behalf, and the Vagrant (via r10k) will have the correct branch in the Puppet repository checked out for running Harrow locally. All in all, it has worked very well, but as complexity increases, and our team grows we wanted to address the ever growing complexity the choice fell to making a monorepo or finding, or building a tool to help using multiple repositories.

Two Choices

Our two choices seemed to be to use a so-called monorepo, and bundle all our software into one giant Git repository, or to use a tool such as Repo. A monorepo always felt like a bad solution given the nature of our software, but we gave it a fair shot, here’s why we decided against it.

Why not a Monorepo?

Stefan Saasen of Atlassian wrote about Git monorepos in 2015 and laid out most of the reasoning and problems with a large-scale deployment of a Git monorepo, he calls out the example of Facebook:

With thousands of commits a week across hundreds of thousands of files, Facebook’s main source repository is enormous—many times larger than even the Linux kernel, which checked in at 17 million lines of code and 44,000 files in 2013.

Facebook faces engineering challenges that most companies don’t, having to modify their source control tooling to handle their extreme use-case isn’t a problem we had to contend with when making the choice. Had we gone in this direction many tools exist to help merging multiple Git repositories into one whilst maintaining the history (unfortunately commit references all change). One such example is git-merge-repos, which does a lot of the legwork to make sure that tags and branches are maintained. One of our main reasons for wanting to keep multiple repositories was that we make extensive use of very specific triggers and inter-repository notifications as part of our build tooling, and many of our components have wildly different release cycles and schedules, out front end ships multiple times per day, whilst our backend APIs ship a couple of times per week, and our infrastructure changes relatively infrequently. Not to mention it simply feels cleaner to have components separated, with the freedom to choose the right tooling.

Note: regarding the choice of tooling. We use two different systems, for two very different use-cases of maintaining encrypted repositories. For our licensing tools we use [git-crypt](https://www.agwa.name/projects/git-crypt/) which maintains individual files encrypted in a repository, and for our Terraform repository we use [git-remote-gcrypt](https://github.com/bluss/git-remote-gcrypt) as the entire repository contains potentially sensitive information.

Switching to a monorepo would have made all of this much more challenging, and would have meant compromising or reconsidering decisions and tools that have worked really well for us.

Getting Started With Repo

The documentation for Repo assumes that you’ll be cloning Android (AOSP) and building it. We’ll start from scratch for the sake of completeness.

Requirements

Repo requires Python 2.x, Git and a Linux / Mac type operating system. There appear to be forks of the Repo tool which work on Windows, but we don’t need, so haven’t tried them.

Installing Repo

On OS X Repo is available from Homebrew with a simple $ brew install repo. On Linux, or if not using Homebrew on OS X it can easily be installed via curl:

$ mkdir $HOME/.bin/
$ export PATH=\$PATH:$HOME/.bin/repo >> $HOME/.profile
$ curl https://storage.googleapis.com/git-repo-downloads/repo > $HOME/.bin/repo
$ chmod a+x $HOME/.bin/repo

Make sure to verify the checksum against those listed here, or clone the repository containing the Repo tool from here to make sure you have a legitimate version.

Writing Your Manifest

The Repo tool uses an XML manifest to describe the project, and the relationships between the various components. The manifest must be called default.xml, and must live in the root directory of a repository. Our repository contains only the default.xml manifest and a short README.md which includes a primer on Repo and some notes on how to bootstrap a Harrow development environment. Android’s own manifest is more than 500 lines long. Our manifest looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<manifest>
  <remote  name="origin" fetch=".." />
  <default revision="master" remote="origin" sync-s="true" sync-j="4" sync-c="true" />
  <project path="baseimage" name="harrow-baseimage.git" />
  <project path="baseimage-tests" name="harrow-baseimage-tests.git" />
  <project path="frontend" name="frontend.git" />
  <project path="knowledge-base" name="kb.git" />
  <project path="licenses" name="licenses.git" />
  <project path="notes" name="notes.git" />
  <project path="puppet" name="puppet.git" revision="development" />
  <project path="src/bitbucket.org/api" name="api.git">
    <linkfile src="." dest="api" />
  </project>
  <project path="style-guide" name="harrow-style-guide.git" />
</manifest>

The full definition of the manifest XML format is described here and is definitely worth skimming. It can copy or link files to specific paths, projects can have submodules, and there are tweaks which can be made to checkout certain branches by default (e.g master by default, in our example, with puppet.git having a special override). The slightly peculiar fetch=".." is added to the URL of the manifest, and the names of the projects are all concatenated, that means that if you host the manifest repository at git@bitbucket.org:harrowio/manifest.git, set the attribute fetch=".." and then give a <project /> an attribute of name="puppet.git" the URL will be git@bitbucket.org:harrowio/manifest.git/../puppet.git – or canonically git@bitbucket.org:harrowio/puppet.git.

Initlaizing Your Repo

With a manifest committed, and pushed to a Git host, we can now make an empty directory and initailize repo in there. This is a one-time task, and may ask you some questions about rendering diffs with color, or your Git identity profile to use when committing and submitting code for review.

$ mkdir -p $HOME/code/myproject
$ cd $HOME/code/myproject
$ repo init -u git@bitbucket.org:harrowio/manifest.git

Repo will setup a couple of hidden directories, and check our and symlink your manifest repository, and the contained default.xml into place. This process should be really fast, at this point nothing of the project list has been cloned, it’s now time to sync.

Syncing Your Repo

With a manifest committed, and pushed to a Git host, we can now make an empty directory and initailize repo in there.

$ mkdir -p $HOME/code/myproject
$ repo sync

This will clone all your projects named in the manifest into the path specified, if the path attribute is not given it defaults to the project name sans the .git suffix. If you have many, larger repositories this can take a while. The option sync-j="4" specified in the manifest means to run four sub-processes in parallel, so don’t be surprised if the Git output from the clone operations looks mangled, four processes are all racing against each other to write to the terminal. It’s worth mentioning that $ repo sync is also fetches changes to the manifest repository if there are any, and that incase you run into any cases where you need something just for your own manifest which maybe isn’t useful for your team the Repo tool supports local manifests.

Starting A Feature

To work on a new feature with Repo you simply run the following command:

$ repo start my-feature-branch frontend knowledge-base

repo start will start a Git branch by the name of my-feature-branch in the frontend.git and kb.git repositories.

From here on in you just work as normal, commit in the individual repositories and work as you always would, you can stash, push and work with Git without changing anything about your workflow at all!

Repo includes the status command which can be run from any directory under the root directory (i.e. inside one of your projects).

Repo status’ output looks something like this, after running the commands above, and making one or two little commits:

$ repo status 
project frontend/            branch my-feature-branch
project knowledge-base/      branch my-feature-branch

From here you can simply work normally, commit tweak and work in those branches in the individual repositories as you ever would. If you use GUI tools for Git they’ll work just fine too.

If you have changes spanning multiple repos that you don’t want to stage for commit individually you can use the following command which is analogous (but not quite the same) as git add:

$ repo stage -i frontend/

However, it’s seldom worth using this path in our experience, as there’s no equivalent of repo commit, so you still would need to commit in the repositories individually. You can probably just ignore repo stage.

Checking-Out A Feature Branch

It’s simple enough to grab feature branches from your team mates too, $ repo branches will list all branches that repo knows about

$ repo branches
   master              | in knowledge-base
*  my-feature-branch   | in knowledge-base, frontend

You can check a single branch out with:

$ repo checkout master

Will checkout the master branch, or one of the named feature branches from your team if you have any.

Uncommitted Changes?

If you have uncommitted changes in one repo and you try to switch the branch, you’ll see something like this:

$ repo status
project frontend/                       branch my-feature-branch
 -m     app/index.html
project knowledge-base                  branch my-feature-branch

Here repo can’t do anything clever, Git wouldn’t have allowed to change the branch with uncommitted changes, and Repo won’t allow it either, for the same reasons. You might want to integrate Repo into your prompt, or setup an alias for $ repo status to have an overview to make sure you’re not accidentally working in, or relying on the wrong branch.

Summary

The Repo tool is definitely optimized for users, people who are using the Android source code, and who just need to be able to switch branches, and make small changes. The repo stage tool would benefit from some better documentation and a smoother workflow, but it’s also apparently relatively seldom used.

Going Further & Getting Help

The Repo tool shares a maintainer team, and it’s home repository with Gerrit, and there’s a Google Group for both with lots of activity, and a proactive and helpful community.

If you have a project with multiple repositories it’s probable that you’ve struggled with continuous integration and deployment, most of the tools out there naïvely assume that one repository is one project, and that all your configuration, code and deploy manifests live there. Of course, with Git submodules and subtrees there are certainly workarounds to make one pseudo-monorepo from multiple projects.

If you have a complex project with more than one repository which you’d like to be able to unit and integration test, deploy, and keep running well in production, give Harrow a try.


Want to try Harrow.io? Start immediately with a 14 day free trial or learn more

Harrow.io is a platform for Continuous Integration, testing and deployment, built by the team behind Capistrano.
Add powerful automation and collaboration capabilities to your existing tools.

  • Automate any software, in any language.
  • Create self-documenting, repeatable web-based tasks
  • Make them available for your team-mates
  • Trigger tasks automatically based on Git changes and webhooks.
  • Get notified by email, slack, etc.
  • Free for small projects

Test, deploy and collaborate online easily, using tools you already know and love.
Works seamlessly for PHP, Node.js, Ansible, Python, Go, Capistrano and more!

Learn more Start your 14 day free trial

No credit card required – completely free for small or public projects.

Lee is founding CTO of Harrow.io and long-time maintainer of Capistrano, the de-facto standard tool for deployment and automation of Rails projects, amongst others.

Start Using Harrow Today!

Harrow is flexible, powerful and can make your team much more efficient.

Start free trial