Skip to content

Category: Git

Split Git Repository

Reading Time: 10 minutes

“organizations which design systems … are constrained to produce designs
which are copies of the communication structures of these organizations.”
M. Conway

Mono-repository in software development is a very popular way of organizing the source code and collaboration around it. It has some pros, like easy refactoring or dependency management, but it also has some cons, like a very high level of coupling between components (of course these statements are debatable but this is not the point of the current post). Some IT giants, like Google, Twitter or Facebook are still using mono-repository but this costs them quite a lot, just look at the new build systems like Bazel or Buck, they were invented to minimize the effort required to manage a huge pile of code.

At the same time, there is an alternative approach of making big products still having the projects distributed across multiple repositories. One of the benefits here is “loosely coupling” that leads to very easy scaling.

Practically not much of the projects are started already being split into modules and stored separately. In most of the cases it is a single repository that is growing until some point in time when the decision to split is made. But until this moment it is already a lot of work has been done. In case if the previous history is not relevant and can be neglected it is quite a simple task to make a split: move modules to the new location and tune CI accordingly. But in case if there is a need to preserve changes history and have it relevant to the content of each new module it becomes a non-trivial task, but (spoiler!) still possible to be performed quite fast.

Here is an abstract project with two logical modules: user-related and guest-related. Both are represented by four directories inside the repository. Ideal plan to separate those modules would be following:

So, how to do this?

Leave a Comment

Debug with Git

Reading Time: 11 minutes

Testing shows the presence, not the absence of bugs.
Dijkstra

Apparently, software regression is a very nasty situation in the development process. It usually means that the last delivery contains something breaking. To overcome the situation the whole release must be analyzed. A developer has to write tests, rollback the changes, run tests, and … it is still there, one more step back in the VCS history and the error is still reproducible. And now this bug just got another additional label “legacy”.

Actually, it turns out that this functionality has not been used for a while thus the bug could be introduced not just with the last commit or two but quite some time ago. In case if the codebase is big enough it may lead to some significant amount of time to find an exact change that introduced this bug.

In practice, there is a way how to automate this search. Below there is an example of this operation within the Git repository.

Leave a Comment