Git mirrors: hardlink deduplication #228

Open
opened 2025-05-08 22:55:39 +02:00 by oysteikt · 1 comment
Owner

Most of the files in each copy of the same repo is exactly the same. We could save some space if we deduplicated the files. This can run on a timely basis (or activate after a new repo copy arrives).

Most of the files in each copy of the same repo is exactly the same. We could save some space if we deduplicated the files. This can run on a timely basis (or activate after a new repo copy arrives).
oysteikt added the servicesnixos labels 2025-05-08 22:55:39 +02:00
oysteikt added this to the Kanban project 2025-05-08 22:55:39 +02:00
oysteikt added the good first issue label 2025-05-08 23:03:33 +02:00
oysteikt moved this to Low priority in Kanban on 2025-05-12 10:50:23 +02:00
oysteikt moved this to Medium priority in Kanban on 2025-05-12 10:50:49 +02:00
Owner

Report from 24th of May

I'm going to work on this issue, the main folder to work in will be
modules/gickup/ inside the pvv-nixos-config repository.

Tools to use

We're planning on using jdupes to find equivalent files and create hard links.

For time scheduling, use systemd-timer.

Core implementation idea:

Write a shell script (or equivalent) which uses jdupes to search through all git mirrors on the filesystem.
The script shall look across the repositories and look for files which are equivalent.

If there are files which are equivalent (even across repositories) then make all files hardlink to one inode.
One example mentioned is that many repositories may have the same LICENSE file.

Finally, use systemd-timer to make the script run once a day.
The final product will be something like a systemd-unit which you can enable by doing
something like systemctl enable git-mirror-hardlink-dedup.service.

# Report from 24th of May I'm going to work on this issue, the main folder to work in will be [`modules/gickup/`](https://git.pvv.ntnu.no/Drift/pvv-nixos-config/src/branch/main/modules/gickup) inside the `pvv-nixos-config` repository. ## Tools to use We're planning on using [`jdupes`](https://codeberg.org/jbruchon/jdupes) to find equivalent files and create hard links. For time scheduling, use systemd-timer. ## Core implementation idea: Write a shell script (or equivalent) which uses `jdupes` to search through all git mirrors on the filesystem. The script shall look across the repositories and look for files which are equivalent. If there are files which are equivalent (even across repositories) then make all files hardlink to one inode. One example mentioned is that many repositories may have the same **LICENSE** file. Finally, use systemd-timer to make the script run **once a day**. The final product will be something like a `systemd-unit` which you can enable by doing something like `systemctl enable git-mirror-hardlink-dedup.service`.
chrisfjo self-assigned this 2025-05-24 22:47:54 +02:00
chrisfjo added reference main 2025-05-24 22:49:15 +02:00
oysteikt removed reference main 2025-08-03 05:13:24 +02:00
Sign in to join this conversation.