User Tools

Site Tools


backup

This is an old revision of the document!


Backup Backup Backup

What

  • What happens if you delete accidentally some (all) files of your directory ?
  • What happens if you want to recover a file that you deleted/changed 2 weeks ago or more ?
  • What happens if your workstation stop working ?
  • What happens if the File Server stop working ?
  • What happens if a disk stop working ?
  • What you have to do in order to backup your data?

In order to answer to these questions, we prepared some systems (hardware and software) that work for us and that save/backup/move our data in an automatic fashion. This way the human intervention (and so the possibility of error) is reduced at the minimum.

Remember that when we say home directory, we talk about the homedir of the user stored on the file server. We don't do backup of workstations or laptops.
For some suggestion on how to backup your laptop see here If a workstation stop working is not a problem, we can repair/substitute the workstation without loss of important data. In case of Personal Laptops we don't offer any kind of safeguard, regarding data stored on the laptops disk.

Why

The most important thing that humans and computer share is just one: the data of the users. No matter what kind of computer, interface, network connection or whatever else mechanical/logical device users can/must use: if they can't access their data (stored inside the computer), problem arise. For a user isn't important where or how the data are saved, the only important thing is that hi/her needs access to these data, even in case of his/hers fault or in case of technical problems (yes, earthquake and flood are just technical problems).

How

To offer the maximum support to users, our Network has different method to backup the data. Each method has pros and cons, and not all the method are accessible directly to the users. In some cases the user must ask to the System Administrators to recover his/hers data from the backup devices. This is necessary to maintain a good level of compromise between access to data and security. The backups are complete collections of all the data that all users collect and thus we have to consider the access to these repositories carefully.
We have 3 different level of backup that offer access to deleted/changed files with different solutions.

  • Backup in the .snapshot directory
  • Backup on Hard Disk in a central backup sever
  • Backup on Tape cartridge in central backup server

.snapshot directory

Every user can find inside his/hers homedir on the central filer an hidden directory named .snapshot. Inside this directory exists others directories that have name

  • hourly.[0-3]
  • nightly.[0-3]
  • weekly.[0-1]

As the name suggest, these directory contain the file changed during the lasts 4 hours, 4 days and 2 weeks (the 0 is something). If the wanted file was deleted/changed during this time, the user have to search these directories, in order to find it. Pay attention that inside the .snapshot directories a user can see the state of his/hers directory how it was, so if you look inside one of these directory you can see not only the files changed, but all the files you have currently. Amazing, isn'it? If the searched file is found, the users can copy it where it want and the restore is done.

Hard Disk in a central backup sever

Every day at 00.30 start a backup process that copies all the files present in the users directories to another server, accessible only to System Administrators. Stored in this server we maintain the last 30 days of all the homedir, so in case of failure of the central file server we can restart with only 1 day of losts. Not vital files aren't backd up. here you can find the list of files excluded.
To retrieve a file from this server the user must ask it to System Administrators, and please don't forget the details about the file.

Tape cartridge in central backup server

Every day there's a backup procedure that copies all the files changed during the day to a tape cartridge stored on the central server of Epfl. Every Sunday night, the backup's procedure do a copy of all the files present on our File Server.
To access the files stored on tapes the user must ask to the System Administrator that forward his/hers request to the backup team of the Epfl. The restore of files from tape require a lot of time, but normally is done during the day of request. The tapes are cheaper than the hard disk (and more slower) so we can maintain the data for more time. Currently we can restore the last 3 months of data, from tape backups.

Laptops

Laptops are the nightmare of any system-manager. Not only they are always a special case, therefore taking the same amount of work as 50 identical workstations, but also (and more importantly), their backup is under the only responsability of the user. The files on a laptop are much more in danger than those on a workstation or on a file server: the laptop HardDisk is smaller and more fragile, the laptop can get stolen or lost, laptops get into many different uncontrolled networks…. Nevertheless, we know that very few users backup their data on a regular basis. As Laptops become more and more convinient to use, the important work data also becomes more and more in danger. It is useless to have a sophisticated centralized backup system as the one described above if users keep their important data on their laptop. So please backup your work data as often as possible. Afterall, your work data belongs to EPFL.

Syncronizing work stuff with Unison

Unison is a file-synchronization tool for Unix and Windows. Here we show how it can be used to keep the work stuff on your laptop in sync with your home directory on the file server. Here is a short check list of the things to do:

  1. install unison
  2. cleanup your directories
  3. chose and/or prioritize what to backup
  4. decide what to exclude from the backup
  5. setup unison configuration files and startup scripts

Install Unison

You can either download it directly from the official web site, or use a precanned package from your favorite distribution. Unison is included in most linux distribution and on both fink and mac ports for Apple OS X. On a Mac, for a nice feedback from the backup scripts, we also suggest to install the growl notifier.

Cleanup your files

Let's classify our files as static or dynamic files, and as personal or professional. Static files are created once and modified at most very rarely as, e.g. pictures, mp3 files, or pdf of downloaded papers. Dynamic files are those that you edit often as, e.g. the latex source files of your latest revolutionary scientific paper, your program sources as well as your contact list. Off-course dynamic files are those that need particular attention and a more frequent backup. Luckly, dynamic files are typically much smaller than static ones. In fact, very often most of the space on the disk is taken by personal static data (pictures, movies and music collection).

<note> Please do not use our server for storing your personal static data. Buy yourself an external Firewire or USB hard-disk instead. The backup of our home directories on tapes takes a long time and the tapes have limited capacity. It is impossible to backup tens of GigaBytes of non-work-related data for every user. The largest the stuff to backup the less frequent and safe are the backup. Please, respect the other users by keeping our disk usage on the servers as limited as possible. </note>

Synchronization policies

We suggest to use two different policies (two separate configuration files) for synchronizing your data.

  1. a frequent (hourly) and automatic synchronization for all your dynamic data (we accept here also your personal files if they are not too large);
  2. an on-demand or less frequent for professional static files.

What to exclude?

Clearly any file that is derived from another one (e.g. compiled files like object .o files, and latex output .dvi files) should be excluded from any backup. Strictly speaking synchronization is not a backup but excluding non source files will speedup your synchronization process.

It is sane to use only one synchronization system for a given file. Version control systems are in fact synchronization systems. Therefore one should exclude also all the files that are alreadt under version control (cvs, svn, mercurial…). I use to to append _svn to the name of any directory containing only files under version control and add the *_svn pattern to my exclude list.

Configuration example

Unison is easy to use, once the configuration file is set.

$ unison -batch myPolicy

where myPolicy is the name of the policy to use and is described in a file with the same name under the $HOME/.unison directory.

Here is how my main configuration file looks like:

# Roots of the synchronization
    root = /Users/cangiani
    root = ssh://cangiani@algosrv5.epfl.ch/

# Here is what I want to keep in sync
    path = Archives
    path = Documents
    ignore = Path Documents/Local
    path = Learn
    path = Projects

# Some regexps specifying names and paths to ignore
    ignore = Name temp.*
    ignore = Name *~
    ignore = Name .*~
    ignore = Name .o~$
    ignore = Name *.{o,x}
    ignore = Name *.{tmp,aux,log,dvi}
    ignore = Name *_svn
    ignore = Name *.sparseimage
    ignore = Name .DS_Store

# always use rsync for sending files
    rsync = true

# first treat smaller files
    sortbysize = true

# On Mac the default FS is case insensitive   :(   
#    ignorecase = true

# Log actions to the terminal
    log = true

# Specific settings
    key = 1
    label = Learn directory
    batch = false

Please refer to the official documentation for all the details.

Do it!

Now that you've prepared and tested your perfect configuration file, it is time to make sure that unison is executed periodically. On unix (linux and mac) you can symply call unison from a script like the following:

#!/bin/sh

# make sure to use always the same hostname even if the actual address of the laptop is variable
export UNISONLOCALHOSTNAME=giovanniMBP

# make sure to be at EPFL connected on a wired connection
ifconfig | grep 128.178.70 > /dev/null 2>&1
if [ $? eq 0 ] ; then
  unison -batch main
fi

and launch the script periodically as a cron job. Edit your cron table (crontab -e) and add something like

38 8-20/2 * * * /Users/cangiani/Desktop/unison.command 2>&1 > /dev/null

which runs the script every 2 hours between 8:38 am and 8:38 pm.

Note that the line

export UNISONLOCALHOSTNAME=giovanniMBP

of the above script is quite important because the database used by unison for keeping track of which files are changed since last sync depends on the host-name (address) of your machine. Since the laptop gets a different ip address and, therefore, a different host-name each time it is connected to the network it happens often that you have false conflicts: a file is supposed to be changed on both machines only because it was synced on a previous run with a previous host-name.

backup.1186522747.txt.gz · Last modified: 2007/08/07 21:39 by cangiani