Backup with rsync and ssh
A summary of some things I've learned about using rsync over ssh for backups.
The Problem
This is a problem I've come up against a few times: I need to backup machineA and store the backups on machineB over the network. One common way to do this is via rsync over ssh. There are a few constraints:
- Process must be secured against eavesdropping during network transfer
- Backups must be capable of being run unattended, without having to type passwords
- Backups must preserve machineA's ownership and permission information when they are archived on machineB
That means that I have to be root user on machineB so that I can set the ownership and permissions on the target file. Since backup should be automated, this needs to run without the intervention of having to enter a password. The obvious answer is to use public key authentication with ssh rather than password authentication. The problem is that if I allow root to log in with PKA, root has unrestricted access to machineB. That is something I would like to avoid if possible.
Public Key Authentication
I discovered recently that I can do a neat trick by using public key auth together with some special features of the authorized_keys2 (for DSA keys) file. In the sshd_config file, you can limit root ssh access in several ways. Although it has been my habit to just set PermitRootLogin to "no", there are other settings which are useful for this purpose. Setting it to "without-password" will force the use of public key only, and prevent a user having to type the password. However, this still means that root can run any command on machineB. How to prevent that? This is where the special parameters in the authorized_keys2 file come in. Basically, this file is a space-separated database which contains columns options, keytype, key, comment. A line typically looks like this:
ssh-dss AAAAAB3Nza[...] root@machineB.localdomain
where [...] represents the public key which can be hundreds of characters long. There are two key options which can be specified which will help reduce the risk of allowing root to do backups using rsync over ssh. You can specify a from address, and you can specify command="command-name" which will mean that "command-name" is the only command which can be executed when connecting using the specified public key. So you modified authorized_keys2 file might look like:
from="192.168.0.107",command="/home/user/validatersync.py" ssh-dss AAAAAB3Nza[...] root@machineB.localdomain
(Note that spaces are not allowed in the options section: options must be comma-separated)
It is important to note that the command supplied by the user (if any) is ignored when PermitRootLogin is set to "forced-commands-only". Only the command in the command= option is executed.
Restricting Public Key Access
And now for the final step which allows this to work: Set PermitRootLogin to "forced-commands-only". The sshd_config man page specifically mentions the use case of remote root backups in describing this option:
If [PermitRootLogin] is set to “forced-commands-only” root login with public key authentication will be allowed, but only if the command option has been specified (which may be useful for taking remote backups even if root login is normally not allowed). All other authentication methods are disabled for root.
Based on the above example, then, connections from machineA to machineB using this public key will only be accepted if the from IP address is 192.168.0.107 and will only allow the execution of one command: verifyrsync.py. But it is not very flexible to only allow one command to be executed. So there is one more little piece of the puzzle we need to solve the problem: the SSH_ORIGINAL_COMMAND environment variable. Here is what the verifysync.py script looks like:
#!/usr/bin/env python
import os
import sys
rsync_cmd = os.getenv('SSH_ORIGINAL_COMMAND')
if rsync_cmd is None:
print "SSH_ORIGINAL_COMMAND is not defined"
else:
print "SSH_ORIGINAL_COMMAND is set to %s" % (rsync_cmd,)
Of course this only demonstrates the simplest possible usage of the variable. With this setup, here is what my conversation with machineB would look like from machineA:
[root@localhost ~]# ssh -i .ssh/rsync-key machineB ls -la
Enter passphrase for key '.ssh/rsync-key':
SSH_ORIGINAL_COMMAND is set to ls -la
The command is just echoed back to me. But now I can allow the execution of more than just one command. Notice also two other things about the interaction:
- I have to use the -i option to specify the private key to use as my identity, otherwise the default key will be used, and the connection will be disallowed by the sshd configuration on machineB.
- The private key is protected by a passphrase for which I am prompted each time I connect.
You might well ask why I have gone to the trouble of setting up a passwordless connection protocol to facilitate unattended operation when I now have to enter a passphrase each time I load my private key. I could just as easily have created a private key with no passphrase (just pressing enter when prompted for the passphrase creates a key with no passphrase). This is insecure, as it allows anyone who has gained access to machineA as root to use the unprotected key to connect to machineB as root also. Of course, they are limited to running only the commands specified in the "command" option in the authorized_keys2 file, but this still presents a security risk. So how do I manage the key so that I minimize the number of times I'm asked for the passphrase?
Keychain
Keychain is a utility that helps manage keys efficiently without compromising security. Installing keychain allows you to run cron jobs and whatnot while only having to type your key passphrase each time the system is rebooted. Since rebooting a backup server is a significant event, the small added administrative burden is worth the extra security provided by this setup.Configuring Keychain
To configure keychain to help with ssh-agent, you need to start keychain from root's bashrc file on the destination machine:# .bashrc
# User specific aliases and functions
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
/usr/bin/keychain --clear ~/.ssh/rsync-key # this line is required for keychain
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
The --clear option forces keychain to unload the memorized private keys each time root logs in. This means that if (heaven forbid) someone does get the root password for machineA, that they don't also immediately have unfettered access to machineB. The user is prompted for the passphrase for any keys that are loaded by keychain. If the correct passphrase is not provided by the user, the keys are not loaded.
Keychain creates a directory .keychain in root's home directory which contain information about how to access the key stored by ssh-agent. For compatibility purposes, it creates a number of different shell scripts -- one for each of csh, sh, and fish. The directory also contains information that can be used to load gpg keys into memory. The listing after loading one key looks like this:
-rw------- 1 root root 80 Feb 16 11:47 localhost.localdomain-csh
-rw------- 1 root root 58 Feb 16 11:47 localhost.localdomain-csh-gpg
-rw------- 1 root root 136 Feb 16 11:47 localhost.localdomain-fish
-rw------- 1 root root 87 Feb 16 11:47 localhost.localdomain-fish-gpg
-rw------- 1 root root 110 Feb 16 11:47 localhost.localdomain-sh
-rw------- 1 root root 74 Feb 16 11:47 localhost.localdomain-sh-gpg
In order to use the stored private keys, you need to source one of these files before running your cron job or script.
Resources
Here are some links to external documents that were helpful on the topic of rsync and ssh for backups:- Articles on IBM DeveloperWorks by Gentoo developer Daniel Robbins about key management:
- General article on using rsync and ssh: http://www.jdmz.net/ssh/
- An excellent rsync howto
- Another article which focuses on making snapshot-type backups using rsync.
- A blog entry about using rsync for backups. Covers rsync options in great detail.
- Some tools which are designed to make this task easier:
- Duplicity -- http://www.nongnu.org/duplicity/index.html
- BoxBackup -- http://www.fluffy.co.uk/boxbackup/
- Rdiff Backup -- http://www.nongnu.org/rdiff-backup/
- Some scripts which perform similar functions to verifyrsync.py:
- authprogs: http://www.hackinglinuxexposed.com/tools/authprogs/src/authprogs (suggested by Cory)
- rrsync: http://www.samba.org/ftp/unpacked/rsync/support/rrsync (part of the Samba project)

