Steven Jewel Blog RSS Feed

03 Oct 2013

ClearSkies internals: Brain-dead threads

I am working on a project called ClearSkies which is a open source, peer-to-peer replacement for BitTorrent Sync or DropBox. A draft version of the protocol is complete and I am now working on making a reference implementation.

Because it's a reference implementation I am making a special effort to make sure that the code is easy to understand. Because of this I am writing it in ruby. While ruby is famous for rails, a website building framework, I have been writing desktop applications in it since roughly the time that rails was released.

Most of the time the program is running the shares will be synced and so the program is going to be idle. When it is working, nearly all of the CPU time will be spent in OpenSSL, calculating SHA256 hashes (which is written in C), or waiting for disk. For non-LAN syncs, most of the time will be spent waiting for the network. Furthermore, the program is meant to run in the background, meaning the user won't be sensitive to its performance. For these reasons I don't believe that the performance gains of writing it in C or go will matter.

The reference implementation is going to consist of a daemon and a command-line user-interface. I believe that a command-line interface needs to come first, since that is the easiest to create, and it frees up time to focus on the details of the daemon. It will be easy for me or someone else to create a graphical user interface later on.

One of the major impacts on design of a network daemon is how to deal with concurrency. For a sync program like this, a typical home user is only going to have a few devices and perhaps a few shares with friends, so if the reference implementation only needs to handle perhaps fifty concurrent connections. The connections are going to be long-lived. My favorite way to write a daemon under these conditions is to make a shared-nothing forking server. That isn't a good option in this case because there is a lot of shared state.

The most obvious choice is threading. Threads work well for such low levels of concurrency. The downside of threads is that you have to be very careful in order to avoid race conditions and deadlocks.

Another choice would be an event-based program. This is a single thread that waits for activity and then responds to it as it comes in, using "select" or something similar. This is most famously used in node.js. This eliminates the need for locks, since there is no real concurrency. I'm fond of writing network daemons this way, but I don't think it's as clear as the procedural code that will be in a threaded daemon, especially if errors are involved. Said another way, while threaded code is more tricky to write, it's easier to read. Since this is the reference implementation clarity is more important.

Another problem with a strictly evented daemon would be that calculating the SHA hashes will lock up the daemon, making it unresponsive to CLI requests. This can be worked around with a hybrid solution that has a few worker threads to handle these types of requests, or by splitting the SHA calculation into smaller chunks. Instead of:

Digest::SHA256.file( path ).hexdigest

You would do:

sha = Digest::SHA256.new

File.open( path, 'rb' ) do |f|
  while data = f.read(1024*1024)
    sha.add data
  end
end

sha.hexdigest

In the past when I've written threaded code, I've tried to add locks in places where they seemed important during the initial development. For clearskies, I'm going to try a new approach while I'll call brain-dead locking. I'm going to create a global lock and have all code hold it by default, except when attempting to interact with a socket, calculate a SHA hash, or read or write to a file.

This should give me similar performance to an implementation with fine-grained locks on a single-core computer, but will be much simpler to write. It should also be no worse than an evented implementation, but will be simpler to read.

Once the program is working I can benchmark and find out which places (if any) can benefit from finer grained locks and only add them in those places. This should result in the minimum amount of effort expended.

Normal locks in ruby look like this:

@mutex = Mutex.new

# later on...

@mutex.synchronized do
  @shared_state.read_something
end

data = socket.gets

@mutex.synchronized do
  @shared_state.write_something
end

Brain-dead-threads look like this:

@shared_state.read_something

data = nil

begin
  $mutex.unlock
  data = socket.gets
ensure
  $mutex.lock
end

@shared_state.write_something

Some helpers can be created to make sure that new threads start with the mutex locked by default.

Analysis

The biggest advantage, in my opinion, is that you start with code that is definitely threadsafe, instead of having to start with unsafe code and then carefully prove its safety later.

One big disadvantage is that if you forget to unlock before doing a blocking operation, the entire program will block. This won't be too big of a problem with clearskies, and it is easy to detect [1].

This approach wouldn't work well if you're trying to do a lot of computation and are using multiple threads to increase performance, but I believe it will work really well for clearskies. I will report back once I've had experience

[1]: Detecting a block on the global lock can be done by having a diagnostic thread that tries to take the lock, with a one second timeout, and then immediately releases it. If it times out, there is a problem. The process is repeated every few seconds.