In EOS, it is possible to configure filsystems so that they require to be accessed by clients through some FSTs acting as proxys. Such node are called proxy, short for data-proxy. Some nodes can also act as firewall entrypoints (fwep) to allow access to data nodes behind a firewall.
A proxy node is just a normal FST daemon running on some machine. It can host standard FST filesystems or no filsystem at all as they are not needed for the purpose of acting as a proxy.
Proxys are grouped into proxygroups whose the name may designate the capacity of the proxys in the group. For instance, the proxygroup kinetic may group proxies that can talk XRootD with a client coming to get/put some data and that can talk with Kinetic drives for the storage.
Depending on the version of the software, FSTs can act as proxy for different types of FS.
Note
A given node can belong to multiple proxygroups
To manage nodes’ membership, the following eos commands can be used:
node proxygroupadd
node proxygrouprm
node proxygroupclear
To view nodes’ membership, the following eos commands can be used:
node status
Here follows an example.
EOS Console [root://localhost] |/eos/multipath/> node status p05153074617805.cern.ch:1095
# ------------------------------------------------------------------------------------
# Node Variables
# ....................................................................................
debug.level := info
debug.state := debug
domain := MGM
gw.ntx := 10
gw.rate := 120
kinetic.cluster.default := base64:...
kinetic.location.default := base64:...
kinetic.reload := default
kinetic.security.default := base64:...
manager := p05153074617805.cern.ch:1094
proxygroups := c5group
...
Note that if proxygroups is not defined for a node , it means the same as proxygroups being defined and empty.
It is also possible to review the scheduling snapshots associated to a proxygroup with the command
geosched show snapshot
Here follows an example (partial output).
EOS Console [root://localhost] |/eos/multipath/> geosched show snapshot
### scheduling snapshot for proxy group c5group :
--------c5group/( free:2|repl:0|pidx:0|status:OK|ulSc:99|dlSc:99|filR:0|totS:0)
`----------nogeotag/( free:2|repl:0|pidx:1|status:OK|ulSc:99|dlSc:99|filR:0|totS:0)
|----------1/( free:1|repl:0|pidx:0|status:RW|ulSc:99|dlSc:99|filR:0|totS:0)@p05153074617805.cern.ch:1095
`----------2/( free:1|repl:0|pidx:0|status:RW|ulSc:99|dlSc:99|filR:0|totS:0)@p05153074625071.cern.ch:1095
A filesystem can be:
Note
As of today, supported imported filsystems can be of the following types:
The type is configured by setting the mount point a filesystem when calling eos fs add. The path can be a local directory starting with / or it can be s3(s):// , http(s):// , kinetic:// , root:// . To tag a filesystem as requiring an access through a proxy of a given proxygroup, the following eos command can be used:
fs config <fsid> proxygroup=<proxygroup>
Note that the special value <none> is equivalent to proxygroup not being defined i.e. no proxygroup associated to the fs.
It is possible to review the proxygroup a filesystem relies on using the following eos command:
fs status <fsid>
Here follows an example (partial output).
EOS Console [root://localhost] |/eos/multipath/> fs status 2
# ------------------------------------------------------------------------------------
# FileSystem Variables
# ....................................................................................
bootcheck := 0
bootsenttime := 1470773776
configstatus := rw
drainperiod := 86400
graceperiod := 86400
host := p05153074617805.cern.ch
hostport := p05153074617805.cern.ch:1095
id := 2
path := kinetic://cluster5/
port := 1095
proxygroup := c5group
queue := /eos/p05153074617805.cern.ch:1095/fst
Note that if proxygroup is not define, it is equivalent to proxygroup having the value <none>.
EOS offers some functionalities to define hosts (gathered in proxygroups) acting as firewall entrypoints (fwep) and when they should be used. First, it is possible to restrain target geotags that are directly accessible from client geotags (i.e no need to go through a fwep). This can be done using the command
geosched access setdirect
geosched access showdirect
EOS Console [root://localhost] |/eos/geotree/users/gadde/2rep/> geosched access showdirect
--------AccessGeotagMapping
|----------site1 [site1 => site1]
| `----------rack1 [site1::rack2 => site1,site2::rack2]
|
`----------site2 [site2 => site2]
geosched access setproxygroup
geosched access showproxygroup
--------AccessGeotagMapping
`----------site2 [site2 => ep2]
`----------rack2 [site2::rack2 => ep22]
The node is just a standard proxy that can access all the possible types of filesystems. It can then be used as a proxy for any fs in the instance.
It is possible to use an XRootD forwarding daemon together with an FST daemon on fwep nodes. With this configuration, the proxy node might not be able to serve the access to all types of filesystems. If a client is scheduled to a filesystem of which the proxygroup is not supported by the scheduled fwep proxy, the scheduler will use the forwarding gateway running on that machine to forward the access to a proxy from the right proxygroup.
EOS Console [root://localhost] |/eos/multipath/> fs ls
#...........................................................................................................................................................................
# host (#...) # id # path # schedgroup # geotag # boot # configstatus # drain # active # health
#...........................................................................................................................................................................
p05153074617805.cern.ch (1095) 1 kinetic://single-drive/ default booted ro nodrain online 1/1 (+0)
p05153074617805.cern.ch (1095) 2 kinetic://cluster5/ default booted rw nodrain online 25/25 (+4)
First some tools are mentioned to help to make the config right. Then, the scheduling procedure is detailed and some additional features are presented.
Proxy scheduling is part of the geoscheduling engine. (see File Geoscheduling) As such, there is an easy way to check if all the proxys are well configured and then taken into account in the geoscheduling system as members of the expected proxygroups. Proxys are organized in trees, one for each proxygroup. Those trees are automatically kept in sync with configurations of the nodes, including the config variable proxygroups. To review the snapshots, the following EOS command can be used.
geosched show snapshot
It can also be very handy, at least for testing purpose, to be able to list the filesystems the replicas of a files are stored on along with their proxygroups. This can be carried out using the EOS command.
fileinfo <path> --proxy
Here follows a sketch of the file scheduling algorithm with an emphasize on the proxy part. When an file access or placement is requested, the execution go through the following steps:
For some reason, it may be necessary that access to a file goes consistently through one node or a subset of the proxygroup. It is called file-sticky proxy scheduling. It is used for instance to maximize performance of some file caching that would be done on the proxy nodes. It involves a filesystem parameter called filestickyproxydepth. It can be set using the eos command:
eos fs <fsid> setconfig filestickyproxydepth=<some_integer>
Note that having this variable undefined is equivalent to have it defined with a negative value and it means that the file-sticky proxy scheduling is disabled.
Usually the outcome of a proxy scheduling for a given filesystem would be the best possible and slightly randomized trade-off between proximity of the filesystem (or the client) and of the proxy and availability of ressource of the proxy. The algorithm which is used does not depend on the file, only on the geotags of the client, the geotag of the filesystem and the geotag of proxies in the proxygroup to be scheduled from.
When using file-sticky proxy scheduling, the behavior is different. First a starting point for the search is decided. If ProxyCloseToFs is false and that the client has a geotag, it is the client’s geotag. Other wise, it is the filesystem’s getag. The starting point is projected on the considered proxygroup’s scheduling tree. Then the resulting point is moved filestickyproxydepth steps uproot. All the proxies in the subtree starting from there are then flated-out in an array. The proxies are sorted by id. The proxy is then selected using the inode number of the file to be accessed.
Choosing the value of filestickyproxydepth depends on where (in terms of geotag) are placed the proxys compared to the filesystems.