4.10. Developing EOS

../_images/cpp.jpg

Getting a development machine [CERN specific]

Go to CERN Openstack and find project ‘IT EOS development’ (if you are not a member ask your colleagues to give you access). Check in the Menu:Project/Compute/Overview if there are still free resources to be used. If not, coordinate with your colleagues on reviewing if all currently existing machines are needed. If yes, inform your section leader that you would like to ask for more resources, then follow Request Quota.

To get a decent machine, look at Menu”Project/Compute/Instances and then click “Launch Instance” button. Follow the form and when choosing “Flavour” look for ‘m2.2xlarge’. If not visible, file a ticket to the Openstack team asking for making it available to you (not publicly available). Machine creation is possible also from a command line using CERN Openstack Tools.

Once the machine is running, you can log-in from aiadm.cern.ch to your machine (as root for convenience) and upgrade to newest packages/kernel etc. (yum clean all; yum upgrade; reboot;).

Source Code

Ask EOS developers at CERN to give you write permissions to the EOS repository. The current development model is that anyone with permission can directly commit to master branch. Newcomers are encouraged to create a separate branch and make a merge request with review from one of the main developers.

Install git, create your working directory (e.g. /devwork/local) and clone the EOS repository:

yum install git
git config --global user.email "your@email.x"
git config --global user.name "FirstName LastName"

mkdir -p /devwork/local
cd /devwork/local

git clone https://gitlab.cern.ch/dss/eos.git

cd eos
git submodule update --recursive --init

Create a build directory …

mkdir build
cd build

Dependencies

Warning

Before compilation of the master branch you have to make sure that you installed all required dependencies.

EL7

There is a convenience scripts to install all dependencies in the EOS source tree:

utils/el7-packages.sh

This script might not be up to date. To be sure you are having all the dependencies installed consistently with the version of EOS code you just downloaded, one first needs to define the EOS yum repositories as stated in the Quick Start Guide/Setup YUM Repository. One may also look for inspiration at the Dockerfiles in the eos/gitlab-ci/prebuild_OSbase of your repository, which follows a similar process we will layout below.

Ask the developers which cmake version is currently supported (cmake3 as of Dec 2020), then, install the following packages (e.g. ccache for speeding up compiling):

yum install --nogpg -y ccache centos-release-scl-rh cmake3 gcc-c++ gdb make rpm-build rpm-sign yum-plugin-priorities && yum clean all

Run cmake3 with the DPACKAGEONLY=1 option and make source rpms:

cmake3 ../ -DPACKAGEONLY=1 && make srpm

Now build the EOS dependencies:

yum-builddep --nogpgcheck --setopt="cern*.exclude=xrootd*" -y SRPMS/*

This will install among other also the devtoolset-8 required for eos development (as of Dec 2020). Add line ‘source /opt/rh/devtoolset-8/enable’ to your bash profile to load each time you log in. This should be confirmed by getting the right path for the compiler, e.g. which c++ will show /opt/rh/devtoolset-8/root/usr/bin/c++ as the output.

You will also need to install QuarkDB. Define the yum repository with /etc/yum.repos.d/quarkdb.repo file with the following content:

[quarkdb-stable]
name=QuarkDB repository [stable]
baseurl=http://storage-ci.web.cern.ch/storage-ci/quarkdb/tag/el7/x86_64/
enabled=1
gpgcheck=False

Then, run:

yum install quarkdb quarkdb-debuginfo redis

Important troubleshooting steps

During the last command, you may encounter error such as Error: No Package found for libmicrohttpd-devel >= 0.9.38, this can be resolved by:

yum install gnutls
yum install libmicrohttpd-devel --disablerepo="*" --enablerepo=eos-citrine-dep
yum-builddep --nogpgcheck --setopt="cern*.exclude=xrootd*" -y SRPMS/*

If you do not succeed enabling devtoolset-8 it might also be that you are using zsh which is incompatible with scl-utils.

Make sure you have compatible xrootd version installed (rpm -qa | grep xroot), currently the above will install you version 5.0.3 which is not yet compatible with EOS <= 4.8.35 (Dec/Jan 2020). Look at the latest version of xrootd in the [eos-citrine-dep] repository (currently 4.12.6) or ask the developers if in doubt.

rpm -qa | grep xroot
> xrootd-client-libs-5.0.3-2.el7.x86_64
> xrootd-server-devel-5.0.3-2.el7.x86_64
> xrootd-selinux-5.0.3-2.el7.noarch
> xrootd-libs-5.0.3-2.el7.x86_64
> xrootd-devel-5.0.3-2.el7.x86_64
> xrootd-server-libs-5.0.3-2.el7.x86_64
> xrootd-server-5.0.3-2.el7.x86_64
> xrootd-private-devel-5.0.3-2.el7.x86_64
> xrootd-client-devel-5.0.3-2.el7.x86_64
> xrootd-5.0.3-2.el7.x86_64

# as mentioned above, the version needs to be the latest available in [eos-citrine-dep] repository (defined in steps above); to fix this, do the following:
yum remove xrootd-*
# it could also be that xrootd 4.12.6 was renamed to xrootd4 in case you have a mix of xrootd and xrootd4 `yum remove xrootd4-*` and make sure you have the right packages:
yum install xrootd xrootd-client xrootd-server-devel xrootd-private-devel --disablerepo="*" --enablerepo=eos-citrine-dep
# yum install xrootd-4.12.6-1.el7 xrootd-client-4.12.6-1.el7 xrootd-server-devel-4.12.6-1.el7 xrootd-private-devel-4.12.6-1.el7 --disablerepo="*" --enablerepo=eos-citrine-dep

It may also currently install eos-folly-2020.10.05.00-1.el7.cern.x86_64 which (for EOS <=4.8.35) needs to be 2019.11.11.00-1.el7.cern, fix this by:

yum remove eos-folly eos-folly-deps
yum install eos-folly-2019.11.11.00-1.el7.cern

Optional Optimizations

yum install -y moreutils \
yum clean all

install moreutils just for ‘ts’, nice to benchmark the build time.

Compilation

EOS is a system of libraries which gets loaded by the xrootd executable. In order to run the version you just cloned (or later modified), you have to compile those libraries and then make sure xrootd loads the correct ones. In order to facilitate the deployment, we install ninja-build package (like make but faster):

yum install ninja-build

Create a new build directory and try to run cmake (see troubleshooting below):

mkdir /devwork/local/eos/build-with-ninja
cd /devwork/local/eos/build-with-ninja
cmake3 ../ -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/ -Wno-dev -DCMAKE_C_COMPILER=/opt/rh/devtoolset-8/root/usr/bin/cc -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-8/root/usr/bin/c++

Compile

ninja-build
# run some unit tests; should finish with [  PASSED  ] XXX tests message
unit_tests/eos-unit-tests

Troubleshooting

The following dependencies might not be required (you should be able to ignore these in the cmake3 output):

Could NOT find Sphinx
Could NOT find fuse3
Could NOT find davix
Could NOT find GTest

Warning

yum can automatically update your packages (in yum history you can see: “-y –skip-broken update” in such a case) you can remove this package yum remove yum-autoupdate to make sure it does not screw up EOS rpms installed.

If when executing the unit tests you have errors about the linker that could not find .so files, you can update your LD_LIBRARY_PATH to add to it the common and the mq directory of your EOS build directory.

Deployment

Use Ninja to install EOS on your development machine:

cd /devwork/local/eos/build-with-ninja
ninja-build install

# depending on your OS, you can remove the el6 repository
# to avoid pulling packages and dependencies from there in the future
rm -rf /etc/yum.repos.d/eos-el6*
rm -rf /etc/yum.repos.d/eos-el7*

After you finish your deployment configuration (see below) and you start modifying the source code, to deploy it, you can do:

sudo systemctl stop eos@*
cd /devwork/local/eos/build-with-ninja
# if needed rm files from the build directory and run cmake3 again before the next step
ninja-build
ninja-build install
rm -rf /etc/yum.repos.d/eos-el6*
cp /etc/xrd.cf.mgm_bkp /etc/xrd.cf.mgm
cp /etc/xrd.cf.mq_bkp /etc/xrd.cf.mq
systemctl daemon-reload
systemctl start eos

Deployment Configuration

We need to configure and run the following set of daemons: MGM, MQ (messaging service between MGM and FSTs), several FSTs and QuarkDB.

QuarkDB

Create a configuration file /etc/xrootd/xrootd-quarkdb.cfg with the following content:

   xrd.port  7777
   xrd.protocol redis:7777 libXrdQuarkDB.so
   redis.mode  standalone
   redis.database  /var/lib/quarkdb/eosns
   # redis.myself  localhost:7777
   redis.password_file  /etc/eos.keytab


In production deployment usually raft mode is used instead of standalone (you need at least 2 nodes for such a mode).

Create path to your QuarkDB namespace specified above:

install -d -o daemon -g daemon /var/lib/quarkdb

Because the eos service will run as user ‘daemon’, you will have to make sure that all relevant files have the correct permissions and change their ownership, i.e. run:

chown -R daemon:daemon /var/log/xrootd
chown -R daemon:daemon /var/eos/
chown -R daemon:daemon /etc/eos.* # + must have permissions 400 !
chown -R daemon:daemon /var/run/xrootd
chown -R daemon:daemon /var/lib/quarkdb
chown -R daemon:daemon /var/spool/xrootd

Because the eos service will run as user “daemon”, you should run the QuarkDB as the same user, i.e.:

Create your QuarkDB path (modify eostest and <myhostname>) :

UUID=eostest-$(uuidgen); echo $UUID; sudo runuser daemon -s /bin/bash -c "quarkdb-create --path /var/lib/quarkdb/eosns --clusterID $UUID --nodes localhost:7777"

Before starting the service, we will need custom config drop-in script. Create the following file path /etc/systemd/system/xrootd@quarkdb.service.d/custom.conf with the following content:

[Service]
User=daemon
Group=daemon

Before starting the service, check once again that the keytabs /etc/eos.* have permission 400. Then start the service (log can be followed in /var/log/xrootd/quarkdb/xrootd.log) and check its status:

systemctl start xrootd@quarkdb
systemctl status xrootd@quarkdb

Note: QuarkDB Installation documentation is a very helpful resource, in particular for troubleshooting!

Test if the QuarkDB runs fine by saving and retrieving key-value pair:

redis-cli -p 7777
127.0.0.1:7777> set mykey myval
OK
127.0.0.1:7777> get mykey
"myval"

MGM

The environment configuration will be loaded from /etc/sysconfig/eos_env which you need to create from a provided example.

mv /etc/sysconfig/eos_env.example /etc/sysconfig/eos_env

Inside you need to fill in various pieces of information:

  • XRD_ROLES="mq mgm fst1 fst2 fst3", depending on how many fst daemons you plan (here 3) to run, you need to specify them here. Drop fed and sync as they are not used anymore.
  • HOST_TARGET was used for the sync and is not needed anymore.
  • EL8 systems will need LD_PRELOAD=/usr/lib64/libjemalloc.so.2
# XRD_ROLES depends on what you wish to run, each separate mgm, mq or fst needs to be specified, e.g. for 3 fsts daemons we put fst1 fst2 fst3.
# Delete "fed" and "sync" if present as they are not used anymore.
XRD_ROLES="mq mgm fst1 fst2 fst3"
EOS_MGM_HOST=<myhostname>.cern.ch`
EOS_MGM_HOST_TARGET` # was used for the sync and can be commented out
EOS_INSTANCE_NAME=eostest` # has to start with "eos" and has the form "eos<name>".
EOS_MGM_MASTER1=<myhostname>.cern.ch`
EOS_MGM_MASTER2=<myhostname>.cern.ch`
EOS_MGM_ALIAS=<myhostname>.cern.ch`
EOS_MAIL_CC=<email>`
EOS_GEOTAG="\:\:<anything>"` # needs to be filled
EOS_NS_ACCOUNTING=1`
EOS_SYNCTIME_ACCOUNTING=1`
EOS_USE_SHARED_MUTEX=1`

Then you need Kerberos security keys on your machine:

cp /etc/krb5.keytab /etc/eos.krb5.keytab

Warning

Kerberos and SSS keytab files are different and use different formats. Read the documentation for the xrdsssadmin command or XRootD SSS protocol at https://xrootd.slac.stanford.edu/doc/dev49/sec_config.htm#_Toc517294117

There is yet another configuration file you will have to modify e.g. /etc/xrd.cf.mgm (note aside: These are configuring the xroot daemons that will be running as a service, this has nothing to do with the config of eos itself which is being saved in QuarkDB.). In this file you can change the security settings as needed, e.g.:

sec.protocol unix
sec.protocol sss -c /etc/eos.client.keytab -s /etc/eos.keytab
# Example disable krb5 and gsi
#sec.protocol krb5
#sec.protocol gsi

sec.protbind localhost.localdomain unix sss
sec.protbind localhost unix sss
sec.protbind * only sss unix

Note: that the order of sec.protbind matters for host matching (matches from “bottom up”, i.e. in reverse order of specification, from most specific to least specific).

Also, activate QuarkDB namespace plugin usage and set other parameters as desired, e.g.:

mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so
mgmofs.instance eostest
mgmofs.qdbcluster localhost:7777
mgmofs.qdbpassword_file /etc/eos.keytab

Once done, backup the result to have it available after the next reinstallation of recompiled EOS:

cp /etc/xrd.cf.mgm /etc/xrd.cf.mgm_bkp

Start the service, check the status and log file in /var/log/eos/mgm/xrdlog.mgm :

systemctl daemon-reload
systemctl start eos@mgm
systemctl status eos@mgm

You will see errors RefreshBrokersEndpoints this is due to the fact that MQ is not yet running:

less /var/log/eos/mgm/xrdlog.mgm
...
210304 11:34:22 time=1614854062.201983 func=RefreshBrokersEndpoints  level=ERROR logid=static.............................. unit=mgm@eos-ccaffy-dev01.cern.ch:1094 tid=00007f324109f700 source=XrdMqClient:495                tident= sec=(null) uid=99 gid=99 name=- geo="" msg="failed to contact broker" url="root://localhost:1097//eos/eos-ccaffy-dev01.cern.ch/mgm_mq_test?xmqclient.advisory.flushbacklog=1&xmqclient.advisory.query=1&xmqclient.advisory.status=1"
...

Troubleshooting:

If by looking at the /var/log/eos/mgm/xrdlog.mgm log file, you see the following error:

Seckrb5: Unable to start sequence on the keytab file FILE:/etc/krb5.keytab; Permission denied

Then do

chmod a+r /etc/krb5.keytab

MQ

Look at /etc/sysconfig/eos_env and /etc/xrd.cf.mq in case you would want any changes there (this can stay as provided by default).

Start the service, check the status and log file in /var/log/eos/mq/xrdlog.mq :

systemctl start eos@mq
systemctl status eos@mq

You will see expected errors in connection queue, this is because MQ can not connect to any running file system daemons (FST). On the other hand the RefreshBrokersEndpoints of the MGM /var/log/eos/mgm/xrdlog.mgm should now disappear.

FST

Look at /etc/sysconfig/eos_env in case you would want any changes there (this can stay as provided by default). For each of XRD_ROLES there has to be a configuration file created. For each the file systems you wish to add, you need to create a new configuration file from a provided template /etc/xrd.cf.fst. The template can be used directly, but you might want to change the port number in each:

for i in {1..3}; do
    cp /etc/xrd.cf.fst /etc/xrd.cf.fst"${i}"
    sed -i "s/xrd.port 1095/xrd.port 200${i}/g" /etc/xrd.cf.fst"${i}"
done;

Also make sure the following 2 lines are present in each cfg file created above:

fstofs.qdbcluster localhost:7777
fstofs.qdbpassword_file /etc/eos.keytab

Each file system will run its own FST daemon (which communicates via MQ with the MGM). For each of these daemons, we need to create a special drop-in script specifying the connection port:

for i in {1..3}; do
    mkdir -p /usr/lib/systemd/system/eos@fst"${i}".service.d
    cd /usr/lib/systemd/system/eos@fst"${i}".service.d
    echo "[Service]" > custom.conf;
    echo "Environment=EOS_FST_HTTP_PORT=900${i}" >> custom.conf;
done;

FST is usually represented by a disk server with many disks mounted on it. For our dev purposes, it is usually enough to just create a directory per fst on the local file system (do not forget to grant “daemon” the ownership). In each of these directories there has to be 2 hidden files containing the ID:code:.eosfsid and UUID .eosfsuuid of the FST you are adding (define however convenient for you), e.g.:

mkdir -p /fst
cd /fst
for i in {1..3}; do
    mkdir data$i
    echo $i >  data$i/.eosfsid
    echo fst$i > data$i/.eosfsuuid;
done
chown daemon:daemon -R /fst

Now, start the FST services:

systemctl daemon-reload
for i in {1..3}; do
  sudo systemctl start eos@fst"${i}"
done;

for i in {1..3}; do
  sudo systemctl status eos@fst"${i}"
done;

The /var/log/eos/fstX/xrdlog.fstX (replace X with the wanted FST number) you will see lines such as:

210116 12:53:13 time=1610797993.062063 func=Storage                  level=INFO  logid=FstOfsStorage unit=fst@<hostname>:2001

Also, you should see all as active and the MQ /var/log/eos/mq/xrdlog.mq connecting to the “nodes” (the 3 various ports) and new links being added to quarkDB /var/log/xrootd/quarkdb/xrootd.log.

If you now run :code:` eos node ls`, you should see the list of (3) FST nodes as they connected to the MGM.

Troubleshooting:

If you see in the log file /var/log/eos/fstX/xrdlog.fst1

[QCLIENT - INFO - connectTCP:302] Encountered an error when connecting to <yourmachine>:7777 (IPv4,stream resolved from localhost): Unable to connect (111):Connection refused

open a firewall for port 7777:

firewall-cmd --zone=public --add-port=7777/tcp --permanent
firewall-cmd --reload

Note: If you are running multiple MGM nodes on separate hosts, you may also need to open the respective ports of all EOS daemons. By default they are:

  • quarkDB: 7777
  • mgm: 1094
  • fst: 1095
  • mq: 1097

EOS namespace configuration

We have QuarkDB, MQ, MGM, and FST daemons running. Now we need to define the EOS space to which we will be adding these file systems. Similarly to production we can define “spare” space, just to keep disks waiting unused in spare and “default” space which will be in this example from 3 scheduling groups by 1 disk each:

eos space define spare 0 0
eos space set spare on
eos space define default 1 3
eos space set default on

# check the status
eos space status default
eos space status spare

Now we need to register the FST disks with EOS (we first put them in “spare”):

for i in {1..3}; do eos fs add fst${i} ${HOSTNAME}:200${i} /fst/data${i} spare ;done;
# to see them added:
eos fs ls

Then drain the disks:

eos fs ls spare | awk '/fst/ {print "eos -b fs config " $3 " configstatus=drain"}' | sh -x

Make sure, they all appear as empty in the output of eos fs ls after this operation.

Then boot them:

for i in {1..3}; do eos fs boot $i;done;

If they do not boot you can check what is wrong by eos fs ls -e.

And finally move them to the default space. They will be distributed to the scheduling groups automatically. If you were to add more fsts for development purposes we do not mind having 2 disks form the same node in the same scheduling group (use the –force option below) which is by default forbidden.

eos fs ls spare | awk '/fst/ {print "eos -b fs mv --force " $3 " default"}' | sh -x

After this you will see the file systems booted, empty and online in the output of eos fs ls -e.

Set the disks to ‘rw’ mode:

eos fs ls default | awk '/fst/ {print "eos -b fs config " $3 " configstatus=rw"}' | sh -x
eos fs ls

Now, you should be able to work with your EOS file system:

eos ls /eos
eos mkdir /eos/<name>/devtests
eos attr ls /eos/<name>/devtests
xrdcp root://localhost//eos/<name>/proc/whoami -
cd /tmp;
cat > hello_eos
Hello EOS !
eos cp hello_eos /eos/<name>/devtests/hello_eos
eos ls -l /eos/<name>/devtests
rm hello_eos
xrdcp "root://localhost//eos/<name>/devtests/hello_eos" hello_eos
cat hello_eos
eos ns
# [...]
# ALL      files created since boot         1
# ALL      container created since boot     1
# [...]

If you enabled other authentication mechanisms than sss and unix, you need to enable them, e.g.:

eos vid enable gsi
eos vid enable krb5