git-annex

git-annex is a tool for managing large files using git. Unlike how git is commonly used this system does not actually version the files themselves, instead it handles symbolic links to the files.

git-annex provide a few very nice features including

  • The ability to handle trees of large files. “Normal git” is not an alternative is this case.
  • Automatic data retrieval. Any client can see the whole repository of files (as symbolic links) although it does not necessarily have access to any of them.  The user can run a command such as “git annex get movie.avi” and git-annex will download this file from any remote that has it.
  • Safe data dropping. Any client may “drop” a file at any time. This means that it’s removed from the local data storage but this will only be allowed if git-annex can verify that it exists at some remote. It’s possible to configure this policy. For example you may decide that some very important files must always be stored in at least three places.
  • Rename tracking. If you rename or move a file this change will be tracked by git and affect other repositories when you choose to perform a push or pull. One very cool thing is that you can rename and move around files you don’t even have available from your local machine. Again this is possible since git-annex only tracks symbolic links in addition to tracking which files are available where. Before I found git-annex I used rsync but it’s not clever when it comes to renaming. However rsync is used “under the hood” when files are transferred between git-annex repositories.

So this sound great. But what is the catch?

  • To make sure that no data is lost by accident all files are kept read only and you must unlock and lock files to modify them. This is very uncommon when it comes to movies or executable files but is often needed when handling music files and images.
  • Git-annex gets quite slow if you add TOO many files. I’m talking about a couple of hundred thousand files. I don’t have that many movies or music files but iTunes and iPhoto love to create billions of billions of tiny files so don’t store their folders in git-annex! Trust me I have tried😉
  • Some programs don’t like symbolic links, they want to access real files. Unfortunately this includes my Boxee media player. This may require you to lock and unlock your whole library quite often which is a waste of time.

I use git-annex to handle my movie collection and executable files such as application or operating system installers.

The git-annex project is developed by Joey Hess and it’s very active. Joey makes updates almost every day and if you find a bug and report it on his webpage he will solve it within a day or two. This guy is marvelous!

Currently git-annex does not have a GUI. One has to type commands in a terminal and this makes it less usable for normal users. I think this may change as git-annex continues to mature and gain popularity. We all, including average people need a good distributed system for managing our large collection of files and I believe that git-annex is up for at least a part of the task.

If you want to try git-annex and use Linux or Mac OS X then go to

http://git-annex.branchable.com/

Update:
I have “solved” my Boxee issue by running rsync between a git annex repository and the filesystem on my boxee like this

rsync --size-only -vaL --progress --delete $GIT_ANNEX_REPO $BOXEE_FS_PATH

Instead of copying the symbolic links it copies the data referred to by these while still naming the files according to the symbolic links. Works great apart from when I do massive renames or move folders around using git-annex. Then all these files and directories have to be transferred again.

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

6 Responses to git-annex

  1. Joey Hess says:

    Very nice overview. One correction: git-annex can allow reverting a file to an earlier version — unless you’ve told it to drop that version. There is even a way to use bup as a special git remote; once you put a version of a file in there it cannot be removed by git-annex and so will always be available to revert to.

  2. kristianr says:

    > Very nice overview.

    Thank you!

    > One correction: git-annex can allow reverting a file to an earlier > version — unless you’ve told it to drop that version.

    Yes, you are right. I didn’t think about that.

    > There is even a way to use bup as a special git remote; once
    > you put a version of a file in there it cannot be removed by
    > git-annex and so will always be available to revert to.

    I have not tried bup but I read about it on your page. I got the impression that it’s not really mature (https://github.com/apenwarr/bup) yet but maybe I’m wrong? “This is a very early version. Therefore it will most probably not work for you, but we don’t know why”.

  3. Pingback: How I use git-annex | Kristian Rumberg

  4. The symbolic links catch can be a big one. I was hoping to use git annex to manage synchronization and storage of photographs taken by multiple users, but OSX Finder will not follow the links to display preview thumbnails.

  5. Gabriele says:

    Another big problem is that when git-annex unlocks a file makes a local copy. When i tried git-annex to syncrhonize my Virtualbox VMs (10GB) i had a big headache.

    • kristianr says:

      Without this feature git-annex would have to use twice as much space on your harddrive to allow for version rollbacks. Git-annex is not good for all use cases, I find it most useful for content that I may move and rename but which I keep read-only 99% of the time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s