git-annex provide a few very nice features including
- The ability to handle trees of large files. “Normal git” is not an alternative is this case.
- Automatic data retrieval. Any client can see the whole repository of files (as symbolic links) although it does not necessarily have access to any of them. The user can run a command such as “git annex get movie.avi” and git-annex will download this file from any remote that has it.
- Safe data dropping. Any client may “drop” a file at any time. This means that it’s removed from the local data storage but this will only be allowed if git-annex can verify that it exists at some remote. It’s possible to configure this policy. For example you may decide that some very important files must always be stored in at least three places.
- Rename tracking. If you rename or move a file this change will be tracked by git and affect other repositories when you choose to perform a push or pull. One very cool thing is that you can rename and move around files you don’t even have available from your local machine. Again this is possible since git-annex only tracks symbolic links in addition to tracking which files are available where. Before I found git-annex I used rsync but it’s not clever when it comes to renaming. However rsync is used “under the hood” when files are transferred between git-annex repositories.
So this sound great. But what is the catch?
- To make sure that no data is lost by accident all files are kept read only and you must unlock and lock files to modify them. This is very uncommon when it comes to movies or executable files but is often needed when handling music files and images.
- Git-annex gets quite slow if you add TOO many files. I’m talking about a couple of hundred thousand files. I don’t have that many movies or music files but iTunes and iPhoto love to create billions of billions of tiny files so don’t store their folders in git-annex! Trust me I have tried😉
- Some programs don’t like symbolic links, they want to access real files. Unfortunately this includes my Boxee media player. This may require you to lock and unlock your whole library quite often which is a waste of time.
I use git-annex to handle my movie collection and executable files such as application or operating system installers.
The git-annex project is developed by Joey Hess and it’s very active. Joey makes updates almost every day and if you find a bug and report it on his webpage he will solve it within a day or two. This guy is marvelous!
Currently git-annex does not have a GUI. One has to type commands in a terminal and this makes it less usable for normal users. I think this may change as git-annex continues to mature and gain popularity. We all, including average people need a good distributed system for managing our large collection of files and I believe that git-annex is up for at least a part of the task.
If you want to try git-annex and use Linux or Mac OS X then go to
I have “solved” my Boxee issue by running rsync between a git annex repository and the filesystem on my boxee like this
rsync --size-only -vaL --progress --delete $GIT_ANNEX_REPO $BOXEE_FS_PATH
Instead of copying the symbolic links it copies the data referred to by these while still naming the files according to the symbolic links. Works great apart from when I do massive renames or move folders around using git-annex. Then all these files and directories have to be transferred again.