Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:services:storage_services:data_archiving:start [2015/09/17 10:18]
totto [Usage under Windows]
en:services:storage_services:data_archiving:start [2015/09/17 10:18] (current)
totto [Usage under UNIX/Linux]
Line 1: Line 1:
 +====== Data Archiving ======
  
 +You can store all files of a closed project or a big collection of data to less expensive storage. This is useful if these data are accessed very rarely or once after a long period of time. In opposite to normal disk space (e.g. your home directory) the archive is not restricted by quotas. The GWDG deploys a Hierarchical Storage System (HSM) consisting of disk storage and a tape robot system. The disk storage works as a cache: At first the archived files are saved at disk and then copied to tapes on two redundant places, one copy at each place. After a period of time the files will be removed from disk.  The duration of a file on disks depends on its size and the cache usage. ​
 +
 +Each user account owns a personal archive that can be used under Windows and UNIX/Linux as well. Special storage management software enables the user to access the archived files like normal files. Only longer access times (some minutes) let you notice that a file is stored on tape. 
 +
 +===== Advise for using the archive efficiently =====
 +          ​
 +Most problems with HSM are caused by storing many small files instead of packing them into large containers (by **ZIP** resp. **TAR**). The problem does not occur before the files are moved from cache to tapes: The files are scattered to different tapes independently from each other not regarding the user's context.
 +
 +In the worst case accessing the files will take several minutes for each file because the robot has to load another cassette for each file. You can easily estimate that a restore of 1,000 files may take longer than one day. During this time the robot would be busy with 1,000 mechanical operations while from user's point of view only one contiguous operation is needed.
 + 
 +So in the interest of all users we would like to recommend not archiving complex structures of files. Directory trees or groups of files always have to be packed into a container (with **ZIP** resp. **TAR**) which can be archived without any concern.
 +  ​
 +Please feel free to contact GWDG's Service-Hotline for further questions (E-Mail: <​support@gwdg.de>;​ Phone: 0551 201-1523).  ​
 + 
 +===== Usage on Windows =====
 +
 +Each Windows user having a GWDG account is given a network share ''​\\winfs-hsm.top.gwdg.de\Username-hsm$''​ as archive storage. So for the username **jsmith** it is ''​\\winfs-hsm.top.gwdg.de\jsmith-hsm$''​.
 +
 +With the explorer the network share can be assigned to a drive: //Computer -> Map network drive// (e.g. the letter ''​H'':​). Now the archive can be used as a normal Windows drive to save folders and files.
 +  ​
 +It is recommended to compress folders as ZIP files before saving to the archive: //File -> send to -> compressed (zipped) folder//.
 +  ​
 +This ensures that files belonging together will not be scattered over different tapes. So at restore time the robot is only invoked once to restore all these files.
 +===== Usage on UNIX/Linux =====
 +  ​
 +Each user of GWDG's UNIX cluster owns a directory ​ ''/​usr/​users/​a/​Username''​ as archive storage. So for the username **jsmith** it is  ​
 +''/​usr/​users/​a/​jsmith''​.
 + 
 +It can be used as a normal UNIX directory. The environment variable ''​AHOME''​ refers to this path.
 +  ​
 +It is recommended to compress directories as tar files before saving to the archive, for example: The user **jsmith** wants to archive the files of his subdirectory ''​data''​. He uses the commands
 +
 +<​code>​
 +cd
 +tar -czvf $AHOME/​data.tgz data
 +</​code>​
 +
 +Now he has a file ''​data.tgz''​ in his ''​$AHOME''​-directory. Because of the relative path data it later can be restored in an arbitrary directory e.g. in ''​$THOME''​ with the commands
 +
 +<​code>​
 +cd $THOME
 +tar -xzvf $AHOME/​data.tgz
 +</​code>​
 +
 +This restore can take a while if there is only the copy on tape left.  ​