EOS has three major interfaces to do file transfers.
eos cp is an EOS shell command which allows to im- and export files from/to EOS using XRootD, http, gsiftp and s3 protocol. By default all traffic flows via the client issuing the command. It is possible to use it in ‘async’ mode where IO is flowing through a third-party host.
eos file copy is a third party transfer interface supporting TPC transfers inside the EOS instance.
eos transfer allows to run scheduled transfers. IO is bridged via dedicated transfer gateways as explained in the following.
As a first overview we refer to the usage information of the EOS cp command. Currently the support of copy full directory trees is only supported for EOS type storage systems.
Usage: cp [--async] [--rate=<rate>] [--streams=<n>] [--recursive|-R|-r] [-a] [-n] [-S] [-s|--silent] [-d] [--checksum] <src> <dst>
'[eos] cp ..' provides copy functionality to EOS.
Options:
<src>|<dst> can be root://<host>/<path>, a local path /tmp/../ or an eos path /eos/ in the connected instanace...
--async : run an asynchronous transfer via a gateway server (see 'transfer submit --sync' for the full options)
--rate : limit the cp rate to <rate>
--streams : use <#> parallel streams
--checksum : output the checksums
-a : append to the target, don't truncate
-n : hide progress bar
-S : print summary
-s --silent : no output just return code
-d : enable debug information
-k | --no-overwrite : disable overwriting of files
Note
If you deal with directories always add a ‘/’ in the end of source or target paths e.g. if the target should be a directory and not a file put a ‘/’ in the end. To copy a directory hierarchy use ‘-r’ and source and target directories terminated with ‘/’ !
eos cp /var/data/myfile /eos/foo/user/data/ : copy 'myfile' to /eos/foo/user/data/myfile
eos cp /var/data/ /eos/foo/user/data/ : copy all plain files in /var/data to /eos/foo/user/data/
eos cp -r /var/data/ /eos/foo/user/data/ : copy the full hierarchy from /var/data/ to /var/data to /eos/foo/user/data/ => empty directories won't show up on the target!
eos cp -r --checksum --silent /var/data/ /eos/foo/user/data/ : copy the full hierarchy and just printout the checksum information for each file copied!
URLs have to be written as:
as3://<hostname>/<bucketname>/<filename> as implemented in ROOT
or as3:<bucketname>/<filename> with environment variable S3_HOSTNAME set
and as3:....?s3.id=<id>&s3.key=<key>
The access id can be defined in 3 ways:
env S3_ACCESS_ID=<access-id> [as used in ROOT ]
env S3_ACCESS_KEY_ID=<access-id> [as used in libs3 ]
<as3-url>?s3.id=<access-id> [as used in EOS transfers ]
The access key can be defined in 3 ways:
env S3_ACCESS_KEY=<access-key> [as used in ROOT ]
env S3_SECRET_ACCESS_KEY=<access-key> [as used in libs3 ]
<as3-url>?s3.key=<access-key> [as used in EOS transfers ]
If <src> and <dst> are using S3, we are using the same credentials on both ends and the target credentials will overwrite source credentials!
Import a file from an S3 storage into EOS:
eos cp as3://swift.cern.ch/eos/bigfile?s3.id=<secret>&s3.key=<secret> /eos/local/bigfile
[eos-cp] going to copy 1 files and 210.06 MB
[eoscp] bigfile Total 200.32 MB |====================| 100.00 % [26.7 MB/s]
[eos-cp] copied 1/1 files and 210.06 MB in 8.63 seconds with 24.33 MB/s
Run the same import via a transfer gateway:
eos cp --async as3://swift.cern.ch/eos/bigfile?s3.id=<secret>&s3.key=<secret> /eos/local/bigfile
success: submitted transfer id=128095
[eoscp TX] [ done ] |====================| 100.0% : 9s
[eoscp] #################################################################
[eoscp] # Date : ( 1343733064 ) Tue Jul 31 13:11:04 2012
...
You can also easily import web files (no upload):
eos cp http://root.cern.ch/drupal /eos/local/root.cern.ch
Every FST node in EOS can act as gateway. In fact it is possible to deploys FSTs only as gateways without any storage attached.
A gateway is enabled via the command:
EOS Console [root://localhost] |/> node gw gateway1.cern.ch:1095 on
You can see the configuration state of nodes by doing:
EOS Console [root://localhost] |/> node ls
#-----------------------------------------------------------------------------------------------------------------------------
# type # hostport # status # status # txgw #gw-queued # gw-ntx #gw-rate # heartbeatdelta #nofs
#-----------------------------------------------------------------------------------------------------------------------------
nodesview gateway1.cern.ch:1095 online on on 0 10 100 ~ 0
nodesview storage1.cern.ch:1095 online on off 0 30 120 0 22
Do disable a gateway do:
EOS Console [root://localhost] |/> node gw gateway1.cern.ch:1095 off
You see in the output of node ls that each node has two parameters for gateways:
variable defition gw-ntx number of parallel transfers on this node gw-rate bandwith limitation used per transfer (if not specified differently by a transfer)
These paremeters are defined via:
EOS Console [root://localhost] |/> node config gateway1.cern.ch gw.rate=100
EOS Console [root://localhost] |/> node config gateway1.cern.ch gw.ntx=10
You can get a comprehansive summary of the configuration per node using the eos node status command:
EOS Console [root://localhost] |/> node status eosdevsrv1.cern.ch
# ------------------------------------------------------------------------------------
# Node Variables
# ....................................................................................
gw.ntx := 10
gw.rate := 100
manager := eosdev.cern.ch:1094
stat.balance.ntx := 2
stat.balance.rate := 25
stat.gw.queued := 0
status := on
symkey := G41RrP1y/SLHsf9AhneqbxXaOSU=
txgw := on
The transfer state machine is as follows:
state inserted validated scheduled stagein | stageout | running done | failed
Interaction with the transfer queue is done via the eos transfer CLI.
EOS Console [root://localhost] |/> transfer
Usage: transfer submit|cancel|ls|enable|disable|reset|clear|resubmit|log ..'[eos] transfer ..' provides the transfer interface of EOS.
Options:
transfer submit [--rate=<rate>] [--streams=<#>] [--group=<groupname>] [--sync] <URL1> <URL2> :
transfer a file from URL1 to URL2
<URL> can be root://<host>/<path> or a local path /eos/...
--rate : limit the transfer rate to <rate>
--streams : use <#> parallel streams
--group : set the group name for this transfer
transfer cancel <id>|--group=<groupname>
cancel transfer with ID <id> or by group <groupname>
<id>=* : cancel all transfers (only root can do that)
transfer ls [-a] [-m] [s] [--group=<groupname>] [id]
-a : list all transfers not only of the current role
-m : list all transfers in monitoring format (key-val pairs)
-s : print transfer summary
--group : list all transfers in this group
--sync : follow the transfer in interactive mode (like interactive third party 'cp')
<id> : id of the transfer to list
transfer enable : start the transfer engine (you have to be root to do that)
transfer disable : stop the transfer engine (you have to be root to do that)
transfer reset [<id>|--group=<groupname>]
: reset all transfers to 'inserted' state (you have to be root to do that)
transfer clear : clear's the transfer database (you have to be root to do that)
transfer resubmit <id> [--group=<groupname>]
: resubmit's a transfer
transfer kill <id>|--group=<groupname>
transfer log <id> : show the log of transfer <id>
: kill a running transfer
transfer purge [<id>|--group=<groupname>]
: remove 'failed' transfers from the transfer queue by id, group or all if not specified
When a transfer has been submitted using transfer submit it will be in state inserted. When a transfer has been assigned to a transfer gateway it is in state scheduled. When a transfer is executed it will be either in status stagein (then stageout) or running. Certain protocols need a two stage process to bridge transfers. When transfer is going into status failed IT can be inspected using transfer log <id>. Transfers moving into done state are automatically purged from the queue and put in the transfer archive.The transfer archive is a daily rotated log file in /var/log/eos/tx/transfer-archive.log storing all transfer logs. It is currently not accessible via the CLI.