Rsync is a versatile tool for copying files or whole directory trees locally or over the network to another computer. rsync -a will copy files and ensure most of attributes, ownership and permissions are carried over. Rsync over network is trivial, with ssh as the default network protocol (rsync can also run in daemon mode).

Rsyncing over network as root

Often I need to do a rsync -a between hosts on directories I’m not the owner of. I do have ssh access and full sudo-access on both hosts. I can’t ssh directly in as the root user on either of them, as direct root logins are (rightly) frowned upon.

Let’s say I’m logged into some soon-to-be-in-production new mail server and want to rsync old-mailserver:/var/spool/mail/ over to /var/spool/mail. Here is a way to do it:

$ sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
    $LOGNAME@old-mailserver:/var/spool/mail/ /var/spool/mail/

or more generic,

SRCHOST=old-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
    ${LOGNAME}@${SRCHOST}:${SRCDIR}/ ${DSTDIR}/

of course it can be reversed,

DSTHOST=new-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
    ${SRCDIR}/ ${LOGNAME}@${DSTHOST}:${DSTDIR}/

Explanation

sudo -E will pass your personal environment variables to the subprocess. This ensures that the new rsync process gets access to your ssh-agent, if you have one running. However, the username is not passed.

The -a option is to sync attributes, permissions and ownership - but be aware that hard links, extended attributes and ACLs will not be preserved! use -HAXa to get all of those - but careful, hard links is a memory-intensive task. Also, while the -a option will overwrite files that already exists on the destination, it will not delete any extra files on the destination. Add --delete for this.

Rsync has plenty of other useful options. If the network connection is unstable and the files are big, throw in --partial. Throw in --verbose and --progress if you prefer that. Worried that the rsync command may be clogging the bandwidth? Use --bwlimit! RTFM for more information!

Rsync will spawn ssh which again will spawn a rsync server on the remote host. --rsync-path tells rsync where to search for rsync on the remote host - but actually it’s possible to put in commands here, so sudo /usr/bin/rsync will execute sudo rsync instead of simply rsync, allowing rsync access to read all those private mailboxes.

Since the username is lost during the initial sudo, we need to tell rsync what username to use when ssh’ing to the remote server. That would typically be stored in the LOGNAME or USER environment variables. If you have another username on the remote host, replace $LOGNAME with your remote username.

Remember that rsync makes a difference between path names ending with a slash and path names not ending with a slash. rsync -a foo bar will create a directory bar/foo while rsync -a foo/ bar will ensure foo/* is copied to bar.

Cheap snapshot backups with rsync

Assuming you have a large pile of files (binaries, i.e. photos - you’d use git for backing up text files, wouldn’t you?), where the files aren’t modified frequently. You would like to take frequent backups, and you want to keep the snapshots. A simple rsync to a constant backup directory won’t do - if a file has been accidentally or maliciously edited or truncated on the source directory, the valuable backup will be overwritten when the backup is run. A simple rsync -a to a new directory every day is non-ideal, it’s very wasteful to store the same content over and over again (unless your file system or storage solution provides automatic deduplication).

There is a cheap and easy way to solve this problem by using rsync with the --link-dest argument. This option will create hard links, thus effectively deduplicate the content. I will just provide a quick and working example, covering the backup of $HOME/photos/ at a remote host photo-album.example.com to /var/backups/photo-album on the local host

  • for more details on rsync --link-dest, RTFM or visit your favorite search engine.

First backup:

$ sudo mkdir /var/backups/photo-album
$ sudo chown $LOGNAME /var/backups/photo-album
$ cd /var/backups/photo-album
$ this_backup=photos-$(date +%FT%H%M)/
$ mkdir $this_backup
$ rsync -a photo-album.example.com:photos/ $this_backup/

Subsequent backups (assuming backup is done only once a day):

$ cd /var/backups/photo-album
$ prev_backup=$(ls -t | head -n1)
$ this_backup=photos-$(date +%FT%H%M)/
$ mkdir $this_backup
$ rsync -a photo-album.example.com:photos/ --link-dest=${BACKUP_DIR}/$prev_backup/ $this_backup/

Verify that the dedup logic works:

$ du -sh $prev_backup
$ du -sh $this_backup
$ du -sh .

You should get roughly the same numbers on all three commands. Exactly the same if no modifications were done.

And now as a generic example

Config:

BACKUP_DIR=/var/backups/photo-album
SOURCE_HOST=photo-album.example.com
SOURCE_BASEDIR=""
DIRNAME=photos
SOURCE_DIR=${SOURCE_BASEDIR}${DIRNAME}

First backup:

$ sudo mkdir $BACKUP_DIR
$ sudo chown $LOGNAME $BACKUP_DIR
$ cd $BACKUP_DIR
$ this_backup=${DIRNAME}-$(date +%FT%H%M)/
$ mkdir $this_backup
$ rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ $this_backup/

Subsequent backups (assuming backup is done only once a day):

$ cd $BACKUP_DIR
$ prev_backup=$(ls -t | head -n1)
$ this_backup=${DIRNAME}-$(date +%FT%H%M)/
$ mkdir $this_backup
$ rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ --link-dest=${BACKUP_DIR}/${prev_backup}/ ${this_backup}/