Problem statement: In my distributed simulation program, I have the problem of keeping the clients synchronised with the server: the clients and the server share a directory hierarchy which contains files that describe a number of simulations.
Previous solution: Previously I had written some code to integrate with subversion. So when you connect as a client, you run svnversion . to figure out what version you are and send that information in your “hello” message. The server checks this, and if it is older than server’s revision, the client is told to update its repository. The client runs an svn update, then attempts to reload its python code. The most important bit is that the simulations are now up to date, or in sync, with those on the server.
One problem with this is that the client machines need subversion installed. Subversion isn’t quite ubiquitous yet, so this is a very real problem. The second is that you need network access for the subversion client and said subversion client needs to be able to authenticate with the server. In the past this has been OK due to the setup of the machines I was working with. However, these two problems are very real and you can’t always get around them easily.
New solution: I no longer care about updating the python code for the simulation client running, this isn’t liable to change (it has existed for some time now and I hardly ever change it) and it already has versioning for the protocol. I concentrate instead on making sure the simulations are up to date. Conceptually, doing this is very easy:
- Client is sent the name of a simulation to run by the server (as before).
- Client md5sums all files pertinent to the simulation and sends back a list of (file, md5sum) tuples.
- Server checks this against its own files. If it matches, it sends back a “md5sum OK” message, else it finds the files that don’t match, gzips them, and sends them back in (file, gzipped file, umask, digest) tuples.
- If the client receives an “OK” message, it notes that this simulation is in sync, and requests another simulation.
- If the client receives a list of files, it saves them locally into its directory hierarchy. It is able to check their integrity by the digest sent back, but in practice I don’t bother. It then requests another simulation.
Whenever a simulation is started, the clients are alerted of possible new simulations. They then clear their cache of simulations they believe to be up to date.
This works nicely because the files that describe a simulation are fairly small and we don’t care too much about efficiency of the network protocol. A lot of the time this runs over LANs anyway.
So?
Now, the point of this article isn’t to describe the above per se, that is nothing special. It is like a very stunted rsync algorithm that sucks big time.
The cool thing, is that coding the above in Python was so incredibly easy! The standard library does all the things like md5 and gzip. And I already had a framework set up in my system based upon the asyncore/asynchat modules. So plugging this all in was so very easy. I made a few typos and small logical errors, but a bit of testing ironed these out and it all goes now.
Thanks Python!