Data Archiving

You can store all files of a closed project or a big collection of data to less expensive storage. This is useful if these data are accessed very rarely or once after a long period of time. In opposite to normal disk space (e.g. your home directory) the archive is not restricted by quotas. The GWDG deploys a Hierarchical Storage System (HSM) consisting of disk storage and a tape robot system. The disk storage works as a cache: At first the archived files are saved at disk and then copied to tapes on two redundant places, one copy at each place. After a period of time the files will be removed from disk. The duration of a file on disks depends on its size and the cache usage.

Each user account owns a personal archive that can be used under Windows and UNIX/Linux as well. Special storage management software enables the user to access the archived files like normal files. Only longer access times (some minutes) let you notice that a file is stored on tape.

Advise for using the archive efficiently

Most problems with HSM are caused by storing many small files instead of packing them into large containers (by ZIP resp. TAR). The problem does not occur before the files are moved from cache to tapes: The files are scattered to different tapes independently from each other not regarding the user's context.

In the worst case accessing the files will take several minutes for each file because the robot has to load another cassette for each file. You can easily estimate that a restore of 1,000 files may take longer than one day. During this time the robot would be busy with 1,000 mechanical operations while from user's point of view only one contiguous operation is needed.

So in the interest of all users we would like to recommend not archiving complex structures of files. Directory trees or groups of files always have to be packed into a container (with ZIP resp. TAR) which can be archived without any concern.

Please feel free to contact GWDG's Service-Hotline for further questions (E-Mail: support@gwdg.de; Phone: 0551 201-1523).

Usage on Windows

Each Windows user having a GWDG account is given a network share \\winfs-hsm.top.gwdg.de\Username-hsm$ as archive storage. So for the username jsmith it is \\winfs-hsm.top.gwdg.de\jsmith-hsm$.

With the explorer the network share can be assigned to a drive: Computer → Map network drive (e.g. the letter H:). Now the archive can be used as a normal Windows drive to save folders and files.

It is recommended to compress folders as ZIP files before saving to the archive: File → send to → compressed (zipped) folder.

This ensures that files belonging together will not be scattered over different tapes. So at restore time the robot is only invoked once to restore all these files.

Usage on UNIX/Linux

Each user of GWDG's UNIX cluster owns a directory /usr/users/a/Username as archive storage. So for the username jsmith it is /usr/users/a/jsmith.

It can be used as a normal UNIX directory. The environment variable AHOME refers to this path.

It is recommended to compress directories as tar files before saving to the archive, for example: The user jsmith wants to archive the files of his subdirectory data. He uses the commands

cd
tar -czvf $AHOME/data.tgz data

Now he has a file data.tgz in his $AHOME-directory. Because of the relative path data it later can be restored in an arbitrary directory e.g. in $THOME with the commands

cd $THOME
tar -xzvf $AHOME/data.tgz

This restore can take a while if there is only the copy on tape left.