Should You Keep Additional Files In Your Repository

Every software developer should be familiar with some type of versioning system, be it git, Subversion, Mercurial or some other. The advantages offered by using such a system are plentiful, they are that vast that such a system is often also used for other types of documents, be it in a different format. Services such as Google Docs, OneDrive or Dropbox offer the exact same features because they have recognised that having a history of your documents goes way beyond source files.

So we all acknowledge that keeping your data in a versioning system is a good way, but this opens up a different question. How should you organise your documents? I don’t mean which folder hierarchy, which in itself is already a challenge, but I mean the actual location. It makes sense that you want to keep your files close together, but how do you categorise your documents? Based on topic, type, target audience?

While the problem is very general I will focus on a subset which I believe is most relevant for us as software engineers as we are faced with them in our daily life. I will always start from the perspective of your repository as the place you keep all of your source code. The topic debated here is about additional documents that you can store along your code and which are related to it. The question I try to answer every time is the same: What are the consequences of storing this data together with your code, and does it make sense?

The first set of documents are related to documentation. With this we need to take into account that there are different types of documentation ranging from developer/API to user documentation. While the developer/API documentation should be maintained by the developers itself as it is closely related to the actual code the user documentation is not. Moreover I often find it best that user documentation is not written by a developer but by someone who looks at the application as a user without too much knowledge of internals. Another difference is the format, developer documentation is often written in a ‘low-level’ formats such as Markdown, user manuals are written in more visual tools.

Most versioning systems used for software can not handle binary formats very easily and do not offer the tools to visually inspect changes. The conflict in requirements makes it hard to keep that type of documentation together with the code. Moreover if the documentation is really managed by non-developers shielding off the code from those who should not mess with it will become nearly impossible.

Something you probably would not have thought about are files related to building the application. For most developers (including myself) it is obvious that this should be part of the repository. But think about it for a bit longer, and what you do is force your way of building upon other people. It is good that you offer a default way, but if everybody would start including their preferred build system the repository would get cluttered with many different files for different build systems. Keeping the build files separately isn’t really an option though as the build files are tightly coupled to the source files. Also the continuous integration systems are part of this, but again it just makes sense to include these files as they will evolve together with the repository. It will however couple your repository to a certain continuous integration system. To me it is less likely that others will want to use a different CI tool since they all do the same thing in the end.

A  type of files that are often included with a repository are those related to your IDE. Some of these files are even added without you knowing it, as they contain preferences and settings for the projects. But maybe you want to add other files containing formatter and other settings such that new people can easily apply these to achieve consistency. But here again you bind your code to an IDE which in the end doesn’t make that much sense as there should be no reason why your code could only be worked on with a single IDE. But here the same is true as for the build files, adding files for different IDE’s will clutter the repository. Not including these files should not cause too much problems as it should be easy enough to import the project into your IDE and configure all of the settings.

A final type I want to discuss are those related to how your run the application. Although most applications don’t require anything special for it to run, you could also add files to run it inside of a Docker. Or files that will package your application such that it can be installed and run like that. Here it depends on what the real purpose of the application is. If the application is only meant to be ran like that it makes sense to include the files, but most often applications are general enough to be run in different ways. Keeping these files in a different repository would however cause a tight coupling between the repositories. Moreover it would complicate your CI setup as changes to the code repository should also trigger the run the build pipeline which is contained in a different repository. But some more evolved CI should be able to handle this, and hopefully your build application allows a loose coupling, by using URI references for instance, such that you don’t have to fall back to physically coupling your repositories by using sub-modules or other approaches.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.