When a change is detected, a sync is initiated. In this mode of operation, when the git-annex assistant fires up, it opens up a persistent ssh connection to the remote and runs the git-annex-shell over there, which notifies it of changes to the git metadata repository. The first requires ssh access to a remote machine where git-annex is installed. To do that, you need some way of live signaling. But the real killer feature here will be automatic detection of changes, both on the local and the remote. So once you have your data and metadata, you can already do syncs via git annex sync -contents. Joey says, however, that there are some known issues with XMPP servers sometimes dropping or reordering some XMPP messages, so he doesn’t encourage that method currently. The transport for the metadata is generally rsync or git, but it can also be XMPP in which Git changesets are basically wrapped up in XMPP presence messages. So, to have a working sync system, you must have a way to transport both the data and the metadata. When you are working with a git-backed repository for git-annex, it can hold data, metadata, or both. Some can support both storage and metadata (rsync, ssh, local drives, etc.) You can even configure a backend to support only metadata (more on why that may be useful in a bit). Some of the git-annex storage backends can support only storage (S3, for instance). This detail is very useful to some, and irrelevant to others. On your clients, git-annex stores this using git. Metadata about your files includes a mapping from the file names to the storage location (based on hashes), change history, and information about the status of each machine that participates in the syncing. Think about it – that is often what you want anyhow, so why maintain an unnecessary copy after it’s synced everywhere? (This behavior is, of course, configurable.) git-annex can also avoid storing in the cloud entirely if the machines are able to reach each other directly at least some of the time. With git-annex, it can be configured such that the server in the cloud only contains a copy of a file until every client has synced it up, at which point it gets removed. With something like Dropbox or OwnCloud, every file in the set you want synchronized has to reside on a server in the cloud. You’d like changes on one to be instantly recognized on another. So let’s say you have a workstation at home and a laptop you take with you to work or the coffee shop. One of the git-annex features is that each client knows the state of each storage repository, as well as the capability set of each storage repository. git-annex has a large number of storage backends some examples include rsync, a remote machine with git-annex on it that has ssh installed, WebDAV, S3, Amazon Glacier, removable USB drive, etc. These blobs are indexed by a hash, and can be optionally encrypted at rest at remote backends. The storage layer simply is blobs of data. It runs as a daemon, and is available for Linux/POSIX platforms, Windows, Mac, and Android. Git-annex has several modes of operation, and the one that enables live syncing is called the git-annex assistant. And then there is an optional layer, which is live signaling used to drive the real-time syncing. There is metadata, which is for things like a filename-to-hash mapping and revision history. git-annex indexes the data in storage by a hash. There is the storage layer, which stores the literal data bytes that you are interested in. Then I’ll illustrate how it works with some scenarios.įundamentally, git-annex takes layers that are all combined in Dropbox and separates them out. Let’s take a look at the high-level architecture of the tool. Depending on your usage pattern, this central provider could require only a few MBs of space even for repositories containing gigabytes or terabytes of data that is kept in sync. Git-annex lets you set up a live syncing solution that requires no central provider at all, or can be used with a completely untrusted central provider. But I like to understand how my tools work. Ir you just want to use it, you don’t need to know all this. I found I had to sort of piece together the architecture between those levels, so I’m writing this all down hoping it will benefit others that are curious. The git-annex wiki has a lot of great information - both low-level reference and a high-level 10-minute screencast showing how easy it is to set up. It’s sort of like a different-colored smell. It takes a bit to wrap your head around, because git-annex is just a little different from everything else. This post isn’t about it’s traditional roots in git or all the features it has for partial copies of large data sets, but rather for its live syncing capabilities like Dropbox. Git-annex has been around for a long time, but I just recently stumbled across some of the work Joey has been doing to it.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |