Harrow is a relatively complex software project, comprising non-trivial software in many distinct languages spread across many repositories. We’ve always coped with this complexity fairly well, most of our team have a
$HOME/code/harrow directory they work in, and most of them only need to check out two or perhaps three repositories at any one time. After a while on the project everyone seems to have everything they need and we’ve never really had to formalize a setup process for our team. Some of our Git repositories have inter-dependencies which have managed themselves, For example our
frontend.git repository has a strict dependency on our
style-guide.git repository as a Bower module. Our
api.git uses git-subtree to pull in all of the code which is used (and shared) with our Puppet manifests in
puppet.git. With those dependencies in place if you only need a working, up-to-date front end Bower will grab the style guide on your behalf, and the Vagrant (via
r10k) will have the correct branch in the Puppet repository checked out for running Harrow locally. All in all, it has worked very well, but as complexity increases, and our team grows we wanted to address the ever growing complexity the choice fell to making a monorepo or finding, or building a tool to help using multiple repositories.
Our two choices seemed to be to use a so-called monorepo, and bundle all our software into one giant Git repository, or to use a tool such as Repo. A monorepo always felt like a bad solution given the nature of our software, but we gave it a fair shot, here’s why we decided against it.
Why not a Monorepo?
Stefan Saasen of Atlassian wrote about Git monorepos in 2015 and laid out most of the reasoning and problems with a large-scale deployment of a Git monorepo, he calls out the example of Facebook:
With thousands of commits a week across hundreds of thousands of files, Facebook’s main source repository is enormous—many times larger than even the Linux kernel, which checked in at 17 million lines of code and 44,000 files in 2013.
Facebook faces engineering challenges that most companies don’t, having to modify their source control tooling to handle their extreme use-case isn’t a problem we had to contend with when making the choice. Had we gone in this direction many tools exist to help merging multiple Git repositories into one whilst maintaining the history (unfortunately commit references all change). One such example is
git-merge-repos, which does a lot of the legwork to make sure that tags and branches are maintained. One of our main reasons for wanting to keep multiple repositories was that we make extensive use of very specific triggers and inter-repository notifications as part of our build tooling, and many of our components have wildly different release cycles and schedules, out front end ships multiple times per day, whilst our backend APIs ship a couple of times per week, and our infrastructure changes relatively infrequently. Not to mention it simply feels cleaner to have components separated, with the freedom to choose the right tooling.
Note: regarding the choice of tooling. We use two different systems, for two very different use-cases of maintaining encrypted repositories. For our licensing tools we use
[git-crypt](https://www.agwa.name/projects/git-crypt/) which maintains individual files encrypted in a repository, and for our Terraform repository we use
[git-remote-gcrypt](https://github.com/bluss/git-remote-gcrypt) as the entire repository contains potentially sensitive information.
Switching to a monorepo would have made all of this much more challenging, and would have meant compromising or reconsidering decisions and tools that have worked really well for us.
Getting Started With Repo
The documentation for Repo assumes that you’ll be cloning Android (AOSP) and building it. We’ll start from scratch for the sake of completeness.
Repo requires Python 2.x, Git and a Linux / Mac type operating system. There appear to be forks of the Repo tool which work on Windows, but we don’t need, so haven’t tried them.
$ mkdir $HOME/.bin/ $ export PATH=\$PATH:$HOME/.bin/repo >> $HOME/.profile $ curl https://storage.googleapis.com/git-repo-downloads/repo > $HOME/.bin/repo $ chmod a+x $HOME/.bin/repo
Writing Your Manifest
The Repo tool uses an XML manifest to describe the project, and the relationships between the various components. The manifest must be called default.xml, and must live in the root directory of a repository. Our repository contains only the
default.xml manifest and a short
README.md which includes a primer on Repo and some notes on how to bootstrap a Harrow development environment. Android’s own manifest is more than 500 lines long. Our manifest looks like this:
<?xml version="1.0" encoding="UTF-8"?> <manifest> <remote name="origin" fetch=".." /> <default revision="master" remote="origin" sync-s="true" sync-j="4" sync-c="true" /> <project path="baseimage" name="harrow-baseimage.git" /> <project path="baseimage-tests" name="harrow-baseimage-tests.git" /> <project path="frontend" name="frontend.git" /> <project path="knowledge-base" name="kb.git" /> <project path="licenses" name="licenses.git" /> <project path="notes" name="notes.git" /> <project path="puppet" name="puppet.git" revision="development" /> <project path="src/bitbucket.org/api" name="api.git"> <linkfile src="." dest="api" /> </project> <project path="style-guide" name="harrow-style-guide.git" /> </manifest>
The full definition of the manifest XML format is described here and is definitely worth skimming. It can copy or link files to specific paths, projects can have submodules, and there are tweaks which can be made to checkout certain branches by default (e.g master by default, in our example, with puppet.git having a special override). The slightly peculiar
fetch=".." is added to the URL of the manifest, and the names of the projects are all concatenated, that means that if you host the manifest repository at
email@example.com:harrowio/manifest.git, set the attribute
fetch=".." and then give a
<project /> an attribute of
name="puppet.git" the URL will be
firstname.lastname@example.org:harrowio/manifest.git/../puppet.git – or canonically
Initlaizing Your Repo
With a manifest committed, and pushed to a Git host, we can now make an empty directory and initailize repo in there. This is a one-time task, and may ask you some questions about rendering diffs with color, or your Git identity profile to use when committing and submitting code for review.
$ mkdir -p $HOME/code/myproject $ cd $HOME/code/myproject $ repo init -u email@example.com:harrowio/manifest.git
Repo will setup a couple of hidden directories, and check our and symlink your manifest repository, and the contained default.xml into place. This process should be really fast, at this point nothing of the project list has been cloned, it’s now time to sync.
Syncing Your Repo
With a manifest committed, and pushed to a Git host, we can now make an empty directory and initailize repo in there.
$ mkdir -p $HOME/code/myproject $ repo sync
This will clone all your projects named in the manifest into the path specified, if the path attribute is not given it defaults to the project name sans the
.git suffix. If you have many, larger repositories this can take a while. The option
sync-j="4" specified in the manifest means to run four sub-processes in parallel, so don’t be surprised if the Git output from the clone operations looks mangled, four processes are all racing against each other to write to the terminal. It’s worth mentioning that
$ repo sync is also fetches changes to the manifest repository if there are any, and that incase you run into any cases where you need something just for your own manifest which maybe isn’t useful for your team the Repo tool supports local manifests.
Starting A Feature
To work on a new feature with Repo you simply run the following command:
$ repo start my-feature-branch frontend knowledge-base
repo start will start a Git branch by the name of
my-feature-branch in the
From here on in you just work as normal, commit in the individual repositories and work as you always would, you can stash, push and work with Git without changing anything about your workflow at all!
Repo includes the
status command which can be run from any directory under the root directory (i.e. inside one of your projects).
Repo status’ output looks something like this, after running the commands above, and making one or two little commits:
$ repo status project frontend/ branch my-feature-branch project knowledge-base/ branch my-feature-branch
From here you can simply work normally, commit tweak and work in those branches in the individual repositories as you ever would. If you use GUI tools for Git they’ll work just fine too.
If you have changes spanning multiple repos that you don’t want to stage for commit individually you can use the following command which is analogous (but not quite the same) as
$ repo stage -i frontend/
However, it’s seldom worth using this path in our experience, as there’s no equivalent of
repo commit, so you still would need to commit in the repositories individually. You can probably just ignore
Checking-Out A Feature Branch
It’s simple enough to grab feature branches from your team mates too,
$ repo branches will list all branches that repo knows about
$ repo branches master | in knowledge-base * my-feature-branch | in knowledge-base, frontend
You can check a single branch out with:
$ repo checkout master
Will checkout the master branch, or one of the named feature branches from your team if you have any.
If you have uncommitted changes in one repo and you try to switch the branch, you’ll see something like this:
$ repo status project frontend/ branch my-feature-branch -m app/index.html project knowledge-base branch my-feature-branch
Here repo can’t do anything clever, Git wouldn’t have allowed to change the branch with uncommitted changes, and Repo won’t allow it either, for the same reasons. You might want to integrate
Repo into your prompt, or setup an alias for
$ repo status to have an overview to make sure you’re not accidentally working in, or relying on the wrong branch.
Repo tool is definitely optimized for users, people who are using the Android source code, and who just need to be able to switch branches, and make small changes. The
repo stage tool would benefit from some better documentation and a smoother workflow, but it’s also apparently relatively seldom used.
Going Further & Getting Help
The Repo tool shares a maintainer team, and it’s home repository with Gerrit, and there’s a Google Group for both with lots of activity, and a proactive and helpful community.
If you have a project with multiple repositories it’s probable that you’ve struggled with continuous integration and deployment, most of the tools out there naïvely assume that one repository is one project, and that all your configuration, code and deploy manifests live there. Of course, with Git submodules and subtrees there are certainly workarounds to make one pseudo-monorepo from multiple projects.
If you have a complex project with more than one repository which you’d like to be able to unit and integration test, deploy, and keep running well in production, give Harrow a try.
Want to try Harrow.io? Start immediately with a 14 day free trial or learn more
Harrow.io is a platform for Continuous Integration, testing and deployment, built by the team behind Capistrano.
Add powerful automation and collaboration capabilities to your existing tools.
- Automate any software, in any language.
- Create self-documenting, repeatable web-based tasks
- Make them available for your team-mates
- Trigger tasks automatically based on Git changes and webhooks.
- Get notified by email, slack, etc.
- Free for small projects
Test, deploy and collaborate online easily, using tools you already know and love.
Works seamlessly for PHP, Node.js, Ansible, Python, Go, Capistrano and more!
No credit card required – completely free for small or public projects.