How-To: Automated incremental daily backups to Amazon S3 using Duplicity

This guide shows how to use Amazon S3 with duplicity to make secure GPG encrypted automated daily incremental backups (snapshots) of a Linux server or desktop. I have been using this method on various servers for several months and it has proved to be a reliable, secure, cheap, and robust method to create automated backups.

I have used this method on Fedora, YDL, and CentOS but the instructions should equally apply to other Linux distributions including Debian and Ubuntu. It will even work on OS X using the MacPorts version of duplicity.

Aims of this guide

This guide explains how to create a simple wrapper script for duplicity that allows you to automatically create GPG encrypted incremental backups that are saved to an Amazon S3 bucket. The script is designed to be executed as a daily cron job so that incremental snapshot backups are created each day. The script creates a full backup set on the 1st day of each month (or when an appropriate full backup cannot be found) and then creates incremental backups on subsequent days.

This guide provides a walk-through of how to create the GPG encryption key, and provides full scripts and example usage for both backup and restore. You could easily adapt the backup script so that it makes full backups each week, or otherwise adjust it to suit your individual needs.

This guide is written with the general Linux user in mind: you do need some understanding of basic linux concepts such as cron, permissions, and directory structures.

What is duplicity?

From the duplicity home page:

Duplicity backs [up] directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

I think that says it all much more concisely than I could manage.

One thing to note is that in my experience, and on certain machines, duplicity can cause a lot of overhead and take a long time to complete. Thus duplicity is not always a viable option when backing up huge amounts of data. That said, for backing up the critical data from a standard web server it can be a great solution. Remember, that if you’re backing up databases then you need to dump them into SQL files first. For MySQL databases I recommend automysqlbackup for this. As always, YMMV.

Before we start

You need to install duplicity (version >= 0.4.3 for S3 support). This how-to doesn’t cover that aspect, but suffice to say that duplicity is available as a package for most major distros so crack open your package manager (be it yum, apt, synaptics or whatever) and install duplicity along with all it’s dependencies.

You also need GnuGP and librsync but they should both be automatically installed as dependencies of duplicity.

Step 1 – Generate a new GPG key

If you already have a GPG key that you want to use then skip this bit – you’ll just need to know what your key is which you can get through “gpg –list-keys” – it is the bit after the / in the “pub” line. Otherwise, read on…

I am going to presume that you’ll be running your backup jobs as root, so open a terminal and become root. If you’re going to run them as a different user then become that user instead but ensure that the user you have chosen has sufficient permissions to backup the data you require.

Now run “gpg –gen-key” to generate your key and follow the prompts:

# gpg --gen-key
gpg (GnuPG) 1.4.9; Copyright (C) 2008 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection?

Accept the default (Enter) or press 1 for DSA and Elgamal.

DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 

Again, the default (2048) is fine. Just hit Enter.

Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 

I don’t want my key to expire, so I just hit Enter again to accept the default. Do whatever you want.

Key does not expire at all
Is this correct? (y/N) 

Sure is. Hit y and then Enter.

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Duplicity Backup
Email address: duplicity@mydomain.com
Comment: Key for duplicity
You selected this USER-ID:
    "Duplicity Backup (Key for duplicity) <duplicity@mydomain.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?

Enter the requested details and then press O for Okay.

You need a Passphrase to protect your secret key.

Enter Passphrase:

Enter a passphrase here. It should be something long and complex. Anything will do, but make sure you remember it because you’ll need it later. When finished press Enter and then re-enter your passphrase when prompted and then press Enter again.

At this stage you may have to help generate some entropy by doing some other task – I find that running “updatedb” in another shell is pretty good, or just randomly tapping the keyboard can do the trick too.

Once it has finished you should get a message like this:

gpg: key BE9274BD marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   1024D/BE9274BD 2008-11-30
      Key fingerprint = 2FB4 A20E 57BA 80BA 9576  3ABD F79F D430 BE92 74BD
uid                  Duplicity Backup (Key for duplicity) <duplicity@mydomain.com>
sub   2048g/F8F35AD8 2008-11-30

Make a note of the key (BE9274BD in this case) as you’ll need that later too.

Important: Remember to backup your GPG key pair somewhere safe and off the current machine. Without this key pair your backups are totally useless to you, so if you lose it and need to restore a backup then you’re up a creak without a paddle. This article shows the proper way to export (and import) your GPG key pair.

Step 2 – The backup wrapper script

This bash wrapper script does a full backup on the 1st day of each month followed by incremental backups on subsequent days. It will also delete old backup sets after X months have passed and it also emails a log report each day giving some valuable statistics about your backup and reporting any errors.

You will need to have the following information handy to edit this backup script for your needs:

  • Your Amazon S3 Access Key ID
  • Your Amazon S3 Secret Access Key
  • Your GPG key
  • Your GPG key passphrase
  • A list of directories you want to back up
  • An email address to send the logs to
  • A unique name for an Amazon S3 bucket (the bucket will be created if it doesn’t yet exist)

The script is as follows, you need to change the bits in bold at least but pay attention to all the variables as you may want to tweak them to suit your needs.

Note that includes/excludes work on a ‘fist match’ basis. So if you want to exclude something in a directory, you need to exclude the file/subdirectory before including the directory. For more info see the duplicity man pages.

#!/bin/bash

# Set up some variables for logging
LOGFILE="/var/log/backup.log"
DAILYLOGFILE="/var/log/backup.daily.log"
HOST=`hostname`
DATE=`date +%Y-%m-%d`
MAILADDR="sysadmin@mydomain.com"

# Clear the old daily log file
cat /dev/null > ${DAILYLOGFILE}

# Trace function for logging, don't change this
trace () {
        stamp=`date +%Y-%m-%d_%H:%M:%S`
        echo "$stamp: $*" >> ${DAILYLOGFILE}
}

# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY"
export PASSPHRASE="YOUR_GPG_PASSPHRASE"

# Your GPG key
GPG_KEY=YOUR_GPG_KEY

# How long to keep backups for
OLDER_THAN="3M"

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://your_s3_bucket_name"

FULL=
if [ $(date +%d) -eq 1 ]; then
        FULL=full
fi;

trace "Backup for local filesystem started"

trace "... removing old backups"

duplicity remove-older-than ${OLDER_THAN} ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "... backing up filesystem"

duplicity \
    ${FULL} \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --volsize=250 \
    --include=/vhosts \
    --include=/etc \
    --include=/home \
    --include=/root \
    --exclude=/** \
    ${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "Backup for local filesystem complete"
trace "------------------------------------"

# Send the daily log file by email
cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR

# Append the daily log file to the main log file
cat "$DAILYLOGFILE" >> $LOGFILE

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE= 

Save the script somewhere and give it an appropriate name. I saved it a /usr/bin/duplicity-backup and make sure to chmod the script to 700 – it contains some sensitive information so we don’t want none privileged users to have read access to it. Run the script as a test then set it up as a daily cron job to run at an appropriate time of night when the server isn’t doing much else.

Step 3 – The restore wrapper script

Clearly we need a way to restore from a backup, so use the following script to do just that:

#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY"
export PASSPHRASE="YOUR_GPG_PASSPHRASE"

# Your GPG key
GPG_KEY=YOUR_GPG_KEY

# The destination
DEST="s3+http://your_s3_bucket_name"

if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi

duplicity \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --file-to-restore $2 \
    --restore-time $1 \
    ${DEST} $3

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE= 

Again, save this file as something sensible and chmod it to 700 to prevent prying eyes. I saved it as /usr/bin/duplicity-restore but feel free to put it wherever you like.

To do a restore simply invoke the script as follows:

duplicity-restore <date> <file> <restore-to>

Some notes on usage: Paths are relative not absolute. So /home/username would be backed up as home/username. You can restore whole directories but the destination needs to exist first. For example, to restore /home/username from November 20 2008 to a local directory ‘restore’, doing the following would not work because ./home does not exist:

cd ~
mkdir restore
cd restore
duplicity-restore "2008-11-20" home/username home/username

However, the following would work and would restore the directory to ./username:

duplicity-restore "2008-11-20" home/username username

That’s all there is to it. As mentioned I’ve been using this method for several months to back up a variety of servers and it works very nicely. I hope it works just as well for you too!

Credits

This solution is the combination of a couple of tips and tricks I found while trawling the web, notably from this howto at randys.org and this post over at the linode.com forums. Credit and thanks goes to the original authors – I have merely hacked their ideas together and added a few touches of my own.

If you find this useful or have any comments or questions then please respond below!

16 thoughts on “How-To: Automated incremental daily backups to Amazon S3 using Duplicity

  1. Dave Abrahams

    Nice guide; Even though I had done something similar myself once, I found it useful to have a simple tutorial that walked me through the steps.

    I was wondering why you didn’t use duplicity’s “–full-if-older-than 1M” option rather than coding the logic for making full backups yourself? If you miss the first day of the month due to downtime, you will fall behind on full backups.

    We used to use shell scripts for this but I found it’s really easy to lose important information when the script breaks for any reason. You can see http://techarcana.net/hydra/backups/ for rationale and links to our Python code.

  2. Dave Abrahams

    Also, you might want to mention that people can really confuse duplicity if they use sudo to run the script manually without changing $HOME. Then GPG looks in the wrong place for key files.

  3. Pingback: kb.hurricane-ridge.com / Bookmarks for February 25, 2009

  4. Pingback: thinking sysadmin / Duplicity to Amazon S3 on FreeBSD: Building on the work of others

  5. Pingback: Søren Vind >> Backing up to Amazon S3 (Part 1)

  6. Pingback: Søren Vind >> Backing up to Amazon S3 (Part 2) - duplicity

  7. Sam S

    Nice script … i was wondering if you could help me troubleshoot my issue. I keep getting the following message.

    “Which can’t be deleted because newer sets depend on them”

    My variables are:

    OLDER_THAN=”14D”
    FULL=”7D”

    What I’ve been doing is logging into S3 using the S3 plugin for Firefox and deleting the old sets that way. Any idea as to why it keeps failing?

  8. Kearney

    Sam,

    Duplicity uses rsync, which contains incremental changes. Those files won’t be deleted because, even though the backup maybe older than 7 days, there are backups which are incrementals and younger than 7 days.

    So, after 2 weeks have passed, those files will be deleted, since the full backup and the incremental backups are now 14 days old, and there exists a full backup newer than the full and incremental backups.

  9. Pingback: How-To: Automated incremental daily backups to Amazon S3 using Duplicity › ec2base

  10. Pingback: user GPG key not able to be used by SUDO Drija

  11. Nathan Olsen

    Since I also got stuck on this step, I thought I’d follow up on Dave Abrahams note regarding sudo and $HOME.

    The instructions on this page will create a gpg key for root, so restoring files will require accessing root’s gpg keyring. If you plan to use sudo to manually restore files, the duplicity-restore script needs to be run using:

    sudo -H duplicity-restore

    Thanks for the great tutorial and scripts!

  12. mzweep

    Very usefull script and tutorial.

    I refer to it a lot in a page I just write : http://wiki.qnap.com/wiki/How_To_Install_Duplicity_On_QNAP_NAS
    Please keep it online as long as possible !

    Warning for everyone : duplicity always use ~/.cache even when you use “–tempdir” option to specify another temp directory.
    Next warning : you must have gpg command, not gpg2… just do a symlink between gpg->gpgp2 if needed.

    For those who want to use duplicity as root, copy your ~/.gnupg to /root/.gnupg …

  13. Pingback: user GPG key not able to be used by SUDO - Admins Goodies

  14. Pingback: Backup to Amazon S3 using Duplicity | Laszlo Molnar

  15. Pingback: Backup your laptop for less than your daily dose of caffeine « 2nd Watch

  16. Pingback: Backup your laptop for less • 2nd Watch

Comments are closed.