Skip to content

Split Git Repository

Reading Time: 10 minutes

“organizations which design systems … are constrained to produce designs
which are copies of the communication structures of these organizations.”
M. Conway

Mono-repository in software development is a very popular way of organizing the source code and collaboration around it. It has some pros, like easy refactoring or dependency management, but it also has some cons, like a very high level of coupling between components (of course these statements are debatable but this is not the point of the current post). Some IT giants, like Google, Twitter or Facebook are still using mono-repository but this costs them quite a lot, just look at the new build systems like Bazel or Buck, they were invented to minimize the effort required to manage a huge pile of code.

At the same time, there is an alternative approach of making big products still having the projects distributed across multiple repositories. One of the benefits here is “loosely coupling” that leads to very easy scaling.

Practically not much of the projects are started already being split into modules and stored separately. In most of the cases it is a single repository that is growing until some point in time when the decision to split is made. But until this moment it is already a lot of work has been done. In case if the previous history is not relevant and can be neglected it is quite a simple task to make a split: move modules to the new location and tune CI accordingly. But in case if there is a need to preserve changes history and have it relevant to the content of each new module it becomes a non-trivial task, but (spoiler!) still possible to be performed quite fast.

Here is an abstract project with two logical modules: user-related and guest-related. Both are represented by four directories inside the repository. Ideal plan to separate those modules would be following:

So, how to do this?

Brief Summary of the Split Process

The guide presented below follows the next steps for each of the modules desired to be extracted:

  • Module content and history isolation.
  • Module structure adjustments.
  • Module merge into a new repository.

So, to have two modules being moved into a separate repository this guide must be executed twice.

Initial State

Assume there is a git history of some project. For simplicity it may look like this (recent commits are on the top):

commit ea1d85c47630301dd8263ba5f2ba87c58d3dbb5f
    Changed User and Guest Frontend code

commit 810c743f10647f9bf74ee2875988245ebb63c54e
    Changed User and Guest Backend code

commit f9a17f59b85a43a8c78ba08c6d90ad75ab29a959
    Changed Guest Frontend code

commit 6b8e17600c423b11fbc7ff914d488778ea8d2162
    Changed User Frontend code

commit 43e10f5b9523f63b3bc01ea93203316531eea97d
    Changed Guest Backend code

commit 5148d5bf05fab1ebc4c5fbc500466c2334b8540c
    Changed User Backend code

commit 26a11307abb27f9d5b1cd0dc60033f9c1dfd611a
    Added Guest Frontend code

commit 2feeeb274a927d642bc60229c065c055d7c3cd67
    Added Guest Backend code

commit 50dba54a4ef00046b6647378b911bfaabead1972
    Added User Frontend code

commit 0624739de7496c16911b2c5eaa6fa69fcbb2930a
    Added User Backend code

This example includes commits where the modules from different target repositories were touched, for example the most recent one:

$ git show ea1d85c47630301dd8263ba5f2ba87c58d3dbb5f

    Changed User and Guest Frontend code

--- a/guest-frontend/guest-frontend-code
+++ b/guest-frontend/guest-frontend-code

 guest-frontend-code
+change

--- a/user-frontend/user-frontend-code
+++ b/user-frontend/user-frontend-code

 user-frontend-code
+change

It includes both, “user” and “guest” frontend modules. Ideally would be nice to have a content of this commit describing one or another module only, depends on the target repository.

Step #1. Isolate Module

git has quite a lot of possibilities for history manipulations. The most convenient one for this particular task is filter-branch1 . It allows rewriting of the commits log with the help of different filters. Plus it can change the files structure or even execute shell commands on top. Official documentation gives the whole list of filters together with behavior description, but within the current guide it is enough to use only one: subdirectory-filter. In short, it cleans up the content of the repository and related history and removes everything besides the specified directory. In the end, it also unwraps the directory and makes it a root of the repository. prune-empty flag forces empty commits (they may appear after history cleaning) to be removed.

git filter-branch --prune-empty --subdirectory-filter user-backend

After the command is being executed here is how the repository will look like:

Commits history is being adjusted accordingly: only commits that are related to files inside user-backend folder are preserved.

The major benefit of using subdirectory-filter instead of index-filter or tree-filter is that it works super fast. The latter two filters have to check the whole history whereas subdirectory-filter just looks at the history of the directory and the rest is just ignored.

The major downside here is that only one directory at a time can be processed.

Note: subdirectory-filter is working with the history attached to the directory by name, so this may be a problem if the directory was renamed, then the old history is not attached to a new directory and in this case another filter must be used. Thus the most suitable scenario for subdirectory-filter is a separation of a root level directories since this structure is in most cases fixed for a long time.

Step #2. Adjust Module Structure

After the module is isolated the repository structure must be adjusted. Logically there must be a separate directory where the content must be placed. Maybe some descriptors for the module must be written, etc. After this is done the changes must be saved:

$ git add .
$ git commit -m "user-backend moved to a separate module"

Step #3. Merge Histories

Now it is time to push changes to a new repository. First, the new remote repository must be linked:

$ git remote set-url origin ssh://git@server/new-repo.git

And here comes a very important step: merge of histories. It is not required when the new repository is empty, because there is nothing to merge. The new module can be simply pushed without extra effort. But in case if the new repository is not empty and one of the modules has been already pushed there then it is not so easy to push something on top since git tracks histories quite carefully and will complain that existing history is not related to a new one. And push will be rejected. To resolve this problem those histories must be merged first. To do this, after the new remote repository is being linked, it is required to pull its content. But in this case git will complain locally with the same error about local and remote histories being unrelated:

fatal: refusing to merge unrelated histories

To force pull and avoid this error the following command must be used:

$ git pull --allow-unrelated-histories

Within this command git will require a message for merge commit to be provided. The commit is just regular merge, so nothing is changed.

Step #4. Push New Module

Done, changes can be pushed to the new repository:

$ git push

Now the steps must be repeated for every module that must be extracted. Note that source repository most probably needs to be checked out again, to reset changes made by filter operation.

Final State

After the steps above are done for both modules the history looks like this:

commit 2515c09e602cf0dba7a61dd8ebc46284f5993117
    Merge histories

commit b91c90cd8d86a9dad460ec120d9196a7d39138ad
    Changed User and Guest Frontend code

commit 224f762bad519dd77558c24583f27a8435111c3d
    Changed User and Guest Backend code

commit 713336709dee830087d0ab8f94446601e7c1c8b1
    Changed User Frontend code

commit e9607b0427bbcfb2ce48771163f8d8e3b67119f3
    Changed User Backend code

commit ca972356fecb9e2de1154dc472c30fecdbbf0a46
    Added User Frontend code

commit cd9ad7d21b9175b3b6dd1e9161c332a9bf7e45b7
    Added User Backend code

It contains only commits that are related to user-module. Interestingly, commit on the second position looks like touching two different modules. The content of this commit was already described and inside source repository it actually included two modules. But after the filtering has been done the new history should include only relevant information. This can be proved by looking into the content again:

$ git show b91c90cd8d86a9dad460ec120d9196a7d39138ad

    Changed User and Guest Frontend code

--- a/user-frontend/user-frontend-code
+++ b/user-frontend/user-frontend-code

 user-frontend-code
+change

Done. The repository has been split.

Links

1. More details on git filter-branch

420 total views, 5 views today

Published inGit

Be First to Comment

Leave a Reply