Steven Jewel Blog RSS Feed

08 Feb 2014

Clearskies Mobile Support

The clearskies project continues to make good progress towards its goal, which is to make a well-specified, open-source peer-to-peer sync program.

I have unfortunately been busy with my day job recently, but in the interim others have been working on a C++ implementation which can be compiled for both mobile and desktop devices. As I have had the time, I have been trying to simplify the protocol to make their job easier.

Manifest Exchange

One area where the current protocol suffers is how mobile clients have to interact with it. The primary problem is that the clearskies protocol assumes that peers will be connected most of the time, and that bandwidth will be plentiful. A mobile device, however, might want to check in only occasionally.

In the protocol, the list of files in a shared folder is called a manifest. On each connection, manifests are exchanged and peers can then know which files they need to fetch from the peer. It is a good system in that it is simple and stateless. The bandwidth necessary to exchange is minimal when there are only a few files in a share, but it could be significant if the share is larger. Clearskies supports having a "shallow" copy of a share, where files are streamed on-demand instead of completely synced, so it's possible that a user might have her music collection on a NAS at home, and stream songs as she listens to them.

The protocol tries to offer some shortcuts for common scenarios. Each peer versions its manifest so that no exchange is necessary if nothing has changed. Via an optional extension, it's possible to have the manifest gzipped. There is even an extension that will let the user only send the changes to the manifest via rsync (but at great increase of CPU).

Vector Clocks

Another problem with the current protocol is that it depends on having an accurate clock on all devices. When each peer's manifests are merged into one, the update time of each file is used to determine which copy of a file is newer. (Update time refers to the last time the record associated with the file was changed, not the file's modification time itself.)

If a user changes a file on two of her devices while both the devices are offline, that creates a conflict for that file. With the current "update time" method, it's impossible to distinguish between a conflict and just two regular updates.

Both of these issues (accurate clock requirement and conflict detection) can be solved by using vector clocks. Vector clocks is an algorithm for detecting conflicts. The name is somewhat dated, and a more modern name might be "vector versioning". In essence, each file has a version number associated with it for each participating peer.

API Users

There is a third area where the current protocol is insufficient. Right now it is easy to write mobile apps (or desktop programs, for that matter) that sync with a central server. This can be done with Parse, amongst many other methods. (Parse doesn't offer full offline support yet.)

There is couchbase-lite, which is a distributed database that can sync any two peers, arbitrarily, and supports offline operation. Typically they would do most of that communication with a central couchbase server, as couchbase-lite doesn't offer any peer-to-peer communication facilities.

I'd like to see clearskies become a library that can be added to applications that allows them to operate without the need for a central server, similar to couchbase-lite, but also handling the peer discovery and peer communication.

Since clearskies, as is currently specified, is a synchronized key-value store that just happens to have a large binary file associated with each entry, it will be simple to separate the "database" functionality and less programs use it directly. There are a few programs, such as vole and syncnet, which are build on top of syncing files when a synchronized database might be more appropriate.

I believe that this separatation will end up making the overall protocol simpler. As a precaution, I am making the change in a branch so that it can be abandoned if it does not prove fruitful.

Vision

I'm hoping that clearskies, as a protocol and a library, can provide peer-to-peer services to applications in layers, depending on the application's needs. The layers would be something along the following lines:

  1. Easy setup. Simple access-code-based setup of a peer-to-peer relationship.

  2. Peer discovery. The library handles finding the IP address of other peers.

  3. Secure connection. The library leverages TLS-SRP connections to communicate (over UDP via µTP, if necessary).

  4. Message delivery. The library handles segmenting messages for delivery.

  5. Synchronized database. A simple key-value store or document store of JSON is synchronized automatically.

  6. File synchronization. Building on the above, a folder can be synchronized between devices.

One application might only need to use the first four layers, while another might use all of them.

Please help me correct factual inaccuracies in this post. My email address is clearskies@ this domain. Further discussion can be found on the mailing list.