Syncing and backing up two desktops

Unison
A Bi-directional File Synchronizer.

I tend to live on two virtual locations: My MacBook Pro running OS X and my main server KUMR.LNS.COM running Linux.  I should say, for decades before I switched to Linux, I ran various versions of *BSD on KUMR.

I have a home directory on both boxes with a subfolder called “projects” that has various things I have been working on for the last 30 or so years.  I want these directories and files in both locations for access and this gives me some semblance of a backup as well.  (Of course I have other backup methods including TimeMachine and other off-site backups.)

Additionally, if I am doing development, I will tend to use my Mac but OS X does have some peculiarities in how various packages like Python, etc. get installed and running package managers like home-brew may not load what I need for an environment that will be deployed on some server so I will do that work on KUMR.  (Ya… I know about containers, and VMs.).

The challenge is how to keep things in sync with each other.  For quite a while I have been using Unison, a file synchronizer that is bi-directional and uses rsync’s rather efficient method of file transfer.  I will skip describing the rsync protocol, but you can check out the paper at https://rsync.samba.org/how-rsync-works.html for the details.

Unison is extremely efficent in working through large file collections.  I currently have about 305 GB with 215,450 files and 27,888 directories just in my “project” folder.  If I was just using “rsync”, it would take a large amount of time in walking through each file, computing the hash, seeing if the hash was the same on the other server and starting up a transfer if it isn’t.  Unison will make a similar crawl of all the files once and then keep track of files via a hash of the file and put it into an archive in ~/.unison directory.    This means that the first time I run Unison it may take a hours to crawl through all the files, but subsequent runs may take less than a minute to scan and transfer, depending on what was changed last.

If you are worried that Unison is missing anything with this system, just go into the .unison directory and delete the archives in both the local an remote servers.  Normally they start with “ar*” or “fp*” and then run unison again.

Unison also knows when you have just moved a file or folder with the same material.  If it sees a new file with the same hash and name of an existing file, it just moves that file as a “shortcut”.  A big win in just moving a folder with a large number of files or large files.

Since I have been using Unison for a while, I have had some tweaks to the Unison configuration file (~/.unison/default.prf).  I thought I would share mine here with some comments that detail the config file itself.  This, by no means is a complete set of options for Unison.  You can see all of it detailed in the manual, of which I would highly suggest reviewing.

# Unison preferences file
# Local root directory...
root = /Users/pozar/
# Remote server, ssh port number and root directory...
root = ssh://lns.com:22//home/pozar
# The program to use to copy files with. In this case rsync...
copyprog = rsync -aX --rsh='ssh -p 22' --inplace --compress
# The program to use to copy files wit that supports partial transfers...
copyprogrest = rsync -aX --rsh='ssh -p 22' --partial --inplace --compress
# maximum number of simultaneous file transfers.  Good to have more than one to really use the pipe.
maxthreads = 5
# synchronize resource forks and HFS meta-data. (true/false/default)
# I'm not interested in seeing AppleDouble files on my Linux box...
rsrc = false 
# Filename and directories to ignore...
ignore = Name projects/Mail/.imap
ignore = Name .FBCIndex
ignore = Name .FBCLockFolder 
ignore = Name {Cache*,.Trash*,.VolumeIcon.icns,.HSicon,Temporary*,.Temporary*,TheFindByContentFolder}
ignore = Name {TheVolumeSettingsFolder,.Metadata,.filler.idsff,.Spotlight,.DS_Store,.CFUserTextEncoding} 
# ignore all files with .unison somewhere in their full path
ignore = Path .unison
# Normally ignore the giant VM Harddrive files...
ignore = Name Documents/Parallels
# Don't try to copy this socket...
ignore = Name projects/gnupg/S.gpg-agent
# ignore = Name Music/iTunes
ignore = Name .lnk
ignore = Name .DS_Store
ignore = Name Thumbs.db
# Directory paths off of root to sync...
path = projects
path = Desktop
path = Documents
path = Downloads
# Keep ssh open to help with long syncs such as the initial one
sshargs = -C -o TCPKeepAlive=yes
# synchronize modification times
times = true
# Create a local log of what happened.
log = true
# Uses the modification time and length of a file as a quick check if a file has changed.
fastcheck=true

There is a handy command line argument called “-batch” which will avoid asking about what to do with each file it found to sync.  Normally it will figure things out by looking at the date stamp.  In some cases, it won’t know that to do.  You can see below an example where the permissions or times of a file may be in conflict with various Django files I have.  In this case I want to propagate this meta data to my remote server “KUMR”.  I would normally use the “>” key to tell it to go from left (Local) to right (KUMR)…

# unison
Contacting server...
Connected [//Tims-MBP.local//Users/pozar -> //kumr//home/pozar]
Looking for changes
  Waiting for changes from server                                       
Reconciling changes
local          kumr               
props    ====> props      Desktop/django/lib/python3.6/site-packages/django/contrib/contenttypes/locale/lb/LC_MESSAGES/django.mo  [] >
props    ====> props      Desktop/django/lib/python3.6/site-packages/django/contrib/contenttypes/locale/lb/LC_MESSAGES/django.po  [] >
props    <-?-> props      Desktop/django/lib/python3.6/site-packages/django/contrib/contenttypes/locale/lt/LC_MESSAGES/django.mo  [] y
Unrecognized command 'y': try again  [type '?' for help]
props    <-?-> props      Desktop/django/lib/python3.6/site-packages/django/contrib/contenttypes/locale/lt/LC_MESSAGES/django.mo  []

With the “-batch” command it avoids this so you can just script this to run from a cronjob if you like.  I normally run unison in batch mode using an alias for bash like:

alias unison='unison -batch'

Of course, if you have a situation like above, it won’t get “fixed”.  That may be fine 99.9% of the time.  Occasionally I run unison without the batch argument just to get things fully in sync.

But what happens if I have a thousand files like this?  Say for some reason, the modify times on a bunch of files got change on both sides.  Typically you would just use the UNIX “yes” command to tell it to send ‘>’ to the program with something like “yes \>”.  Unison will take this input happily until it comes to the last question where it asks if you want to propagate the changes.  Then it is looking for a ‘y’ or ‘n’.  Fortunately a ‘y’ is ignored when Unison is asking what direction to propagate the files (see above).  So you can use the bash command:

while true; do echo ">"; echo "y" ;done | unison

This sends a ‘>’ and then a ‘y’ continuously into unison.  Eventually it will ask if it should propagate the changes and it will get a ‘y’.

I should say this command should be considered a bit “dangerous” unless you are sure, the meta data and files you are propagating are what you want on the other side.

Hope this gives you some insight on this rather handy tool.  Drop me a line if you have comments or questions.