“organizations which design systems … are constrained to produce designs
which are copies of the communication structures of these organizations.”
Mono-repository in software development is a very popular way of organizing the source code and collaboration around it. It has some pros, like easy refactoring or dependency management, but it also has some cons, like a very high level of coupling between components (of course these statements are debatable but this is not the point of the current post). Some IT giants, like Google, Twitter or Facebook are still using mono-repository but this costs them quite a lot, just look at the new build systems like Bazel or Buck, they were invented to minimize the effort required to manage a huge pile of code.
At the same time, there is an alternative approach of making big products still having the projects distributed across multiple repositories. One of the benefits here is “loosely coupling” that leads to very easy scaling.
Practically not much of the projects are started already being split into modules and stored separately. In most of the cases it is a single repository that is growing until some point in time when the decision to split is made. But until this moment it is already a lot of work has been done. In case if the previous history is not relevant and can be neglected it is quite a simple task to make a split: move modules to the new location and tune CI accordingly. But in case if there is a need to preserve changes history and have it relevant to the content of each new module it becomes a non-trivial task, but (spoiler!) still possible to be performed quite fast.
Here is an abstract project with two logical modules: user-related and guest-related. Both are represented by four directories inside the repository. Ideal plan to separate those modules would be following:
So, how to do this?
Brief Summary of the Split Process
The guide presented below follows the next steps for each of the modules desired to be extracted:
- Module content and history isolation.
- Module structure adjustments.
- Module merge into a new repository.
So, to have two modules being moved into a separate repository this guide must be executed twice.
Assume there is a
git history of some project. For simplicity it may look like this (recent commits are on the top):
commit ea1d85c47630301dd8263ba5f2ba87c58d3dbb5f Changed User and Guest Frontend code commit 810c743f10647f9bf74ee2875988245ebb63c54e Changed User and Guest Backend code commit f9a17f59b85a43a8c78ba08c6d90ad75ab29a959 Changed Guest Frontend code commit 6b8e17600c423b11fbc7ff914d488778ea8d2162 Changed User Frontend code commit 43e10f5b9523f63b3bc01ea93203316531eea97d Changed Guest Backend code commit 5148d5bf05fab1ebc4c5fbc500466c2334b8540c Changed User Backend code commit 26a11307abb27f9d5b1cd0dc60033f9c1dfd611a Added Guest Frontend code commit 2feeeb274a927d642bc60229c065c055d7c3cd67 Added Guest Backend code commit 50dba54a4ef00046b6647378b911bfaabead1972 Added User Frontend code commit 0624739de7496c16911b2c5eaa6fa69fcbb2930a Added User Backend code
This example includes commits where the modules from different target repositories were touched, for example the most recent one:
$ git show ea1d85c47630301dd8263ba5f2ba87c58d3dbb5f Changed User and Guest Frontend code --- a/guest-frontend/guest-frontend-code +++ b/guest-frontend/guest-frontend-code guest-frontend-code +change --- a/user-frontend/user-frontend-code +++ b/user-frontend/user-frontend-code user-frontend-code +change
It includes both, “user” and “guest” frontend modules. Ideally would be nice to have a content of this commit describing one or another module only, depends on the target repository.
Step #1. Isolate Module
git has quite a lot of possibilities for history manipulations. The most convenient one for this particular task is
filter-branch1 . It allows rewriting of the commits log with the help of different filters. Plus it can change the files structure or even execute shell commands on top. Official documentation gives the whole list of filters together with behavior description, but within the current guide it is enough to use only one:
subdirectory-filter. In short, it cleans up the content of the repository and related history and removes everything besides the specified directory. In the end, it also unwraps the directory and makes it a root of the repository.
prune-empty flag forces empty commits (they may appear after history cleaning) to be removed.
git filter-branch --prune-empty --subdirectory-filter user-backend
After the command is being executed here is how the repository will look like:
Commits history is being adjusted accordingly: only commits that are related to files inside
user-backend folder are preserved.
Step #2. Adjust Module Structure
After the module is isolated the repository structure must be adjusted. Logically there must be a separate directory where the content must be placed. Maybe some descriptors for the module must be written, etc. After this is done the changes must be saved:
$ git add . $ git commit -m "user-backend moved to a separate module"
Step #3. Merge Histories
Now it is time to push changes to a new repository. First, the new remote repository must be linked:
$ git remote set-url origin ssh://git@server/new-repo.git
And here comes a very important step: merge of histories. It is not required when the new repository is empty, because there is nothing to merge. The new module can be simply pushed without extra effort. But in case if the new repository is not empty and one of the modules has been already pushed there then it is not so easy to push something on top since
git tracks histories quite carefully and will complain that existing history is not related to a new one. And push will be rejected. To resolve this problem those histories must be merged first. To do this, after the new remote repository is being linked, it is required to pull its content. But in this case
git will complain locally with the same error about local and remote histories being unrelated:
fatal: refusing to merge unrelated histories
To force pull and avoid this error the following command must be used:
$ git pull --allow-unrelated-histories
Within this command
git will require a message for merge commit to be provided. The commit is just regular merge, so nothing is changed.
Step #4. Push New Module
Done, changes can be pushed to the new repository:
$ git push
Now the steps must be repeated for every module that must be extracted. Note that source repository most probably needs to be checked out again, to reset changes made by filter operation.
After the steps above are done for both modules the history looks like this:
commit 2515c09e602cf0dba7a61dd8ebc46284f5993117 Merge histories commit b91c90cd8d86a9dad460ec120d9196a7d39138ad Changed User and Guest Frontend code commit 224f762bad519dd77558c24583f27a8435111c3d Changed User and Guest Backend code commit 713336709dee830087d0ab8f94446601e7c1c8b1 Changed User Frontend code commit e9607b0427bbcfb2ce48771163f8d8e3b67119f3 Changed User Backend code commit ca972356fecb9e2de1154dc472c30fecdbbf0a46 Added User Frontend code commit cd9ad7d21b9175b3b6dd1e9161c332a9bf7e45b7 Added User Backend code
It contains only commits that are related to
user-module. Interestingly, commit on the second position looks like touching two different modules. The content of this commit was already described and inside source repository it actually included two modules. But after the filtering has been done the new history should include only relevant information. This can be proved by looking into the content again:
$ git show b91c90cd8d86a9dad460ec120d9196a7d39138ad Changed User and Guest Frontend code --- a/user-frontend/user-frontend-code +++ b/user-frontend/user-frontend-code user-frontend-code +change
Done. The repository has been split.