Warning
Warning: With EOS5, the MGM is expecting to have its configuration in QuarkDB already, so, before starting the upgrade, it is really important to follow the instructions for exporting the file configuration into QuarkDB: Master/Slave QuarkDB Configuration.
Warning
Before you start: please make sure you already run in HA mode, or that you start the upgrade process with the active MGM, otherwise you risk of running two active MGMs at the same time, with unknown consequences.
Warning
When using HA mode, there might still be an issue (< 5.0.17, April 2022) with FSTs losing symkeys (and by that capability to communicate with MGM) during switch over of the master to another headnode. We advise to run in single MGM mode only (i.e. switching off all other MGM processes on the other slave nodes of the headnode cluster until this issue gets resolved).
The recipe below is tailored to CERN EOS team puppet manifests, but contains general information useful for any EOS 4 to EOS 5 upgrade.
After the above has been put in place, one needs to add the right repositories to the manifest:
eosserver::repo::repo_main_url: https://storage-ci.web.cern.ch/storage-ci/eos/diopside/tag/testing/el-7/x86_64/
eosserver::repo::repo_deps_url: https://storage-ci.web.cern.ch/storage-ci/eos/diopside-depend/el-7/x86_64/
Please cross check the XrdHttp port configuration and mapping to be as in e.g.: code/manifests/pilot/servers/ns.pp (or LHCb). In order to have both XrdHttp and libmicro working side by side (In xrootd 4 the http plugin is installed after the ofs is loaded, while in xrootd 5 the http plugin is installed after the xroot plugin and then the ofs layer will try start libmicro at which point libmiro may fail unless the ports are set correctly). In the sysconfig, you should have:
EOS_MGM_HTTP_PORT: 8443
xrd.protocol: XrdHttp:8444 libXrdHttp.so
and in code/manifests/pilot/servers/ns.pp:
# Redirect 443 to 8444 for XrdHttp traffic
['iptables', 'ip6tables'].each |Integer $index, String $provider| {
firewall { "120-${index} redirect 443 to 8444 for XrdHttp traffic":
chain => 'PREROUTING',
table => 'nat',
jump => 'REDIRECT',
provider => $provider,
proto => 'tcp',
dport => 443,
toports => 8444,
}
firewall { "121-${index} redirect 443 to 8444 for XrdHttp local traffic":
chain => 'OUTPUT',
table => 'nat',
jump => 'REDIRECT',
provider => $provider,
outiface => 'lo',
proto => 'tcp',
dport => 443,
toports => 8444,
}
}
# Redirect 8000 to 8443 for Nginx HTTP traffic
['iptables', 'ip6tables'].each |Integer $index, String $provider| {
firewall { "122-${index} redirect 8000 to 8443 for Nginx HTTP traffic":
chain => 'PREROUTING',
table => 'nat',
jump => 'REDIRECT',
provider => $provider,
proto => 'tcp',
dport => 8000,
toports => 8443,
}
}
Important thing to notice is that the quarkdb library comes now with EOS itself eos-quarkdb, for this, one needs to have a line like the following one in the manifest which will uninstall old quarkdb package and install the new eos-quarkdb: hg_eos::private::ns_compact::eos5enabled: true. If not present eos-quarkdb will be uninstalled ! This is why we first keep this flag false –> run puppet –> disable puppet –> install new EOS version with eos-quarkdb manually (based on eos dependencies) and finally change this flag to true in the manifest and enable puppet again once EOS 5 is actually running. This is to avoid that puppet removed the quarkdb/eos-quarkdb package while EOS 4/5 is running using it.
This means, first we change:
hg_eos::include::versionlock::eosversion: 4.8.74-1.el7.cern
hg_eos::include::versionlock::xrootversion: 4.12.8-1.el7
to
hg_eos::include::versionlock::eosversion: 5.0.9-1.el7.cern
hg_eos::include::versionlock::xrootversion: 5.3.4-1.el7
In addition in servers.yaml (/ns.yaml) change these lines:
http.cadir: /etc/grid-security/certificates/
http.cert: /etc/grid-security/daemon//hostcert.pem
http.key: /etc/grid-security/daemon/hostkey.pem
http.gridmap: /etc/grid-security/grid-mapfile
http.secxtractor: libXrdVoms.so
mgmofs.macaroonslib: libXrdMacaroons.so /opt/eos/lib64/libXrdAccSciTokens.so
to (for versions < 5.0.16):
xrd.tls: /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
xrd.tlsca: certdir /etc/grid-security/certificates/
http.gridmap: /etc/grid-security/grid-mapfile
http.secxtractor: libXrdHttpVOMS.so
mgmofs.macaroonslib: libXrdMacaroons.so libEosAccSciTokens.so
For versions 5.0.16+:
xrd.tls: /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
xrd.tlsca: certdir /etc/grid-security/certificates/
http.gridmap: /etc/grid-security/grid-mapfile
http.secxtractor: libXrdHttpVOMS.so
mgmofs.macaroonslib: libXrdMacaroons.so libXrdAccSciTokens.so
and make sure the library path states:
LD_LIBRARY_PATH: "/opt/eos/xrootd/lib64/:$LD_LIBRARY_PATH"
One need to have /opt/eos/xrootd/lib64/ in LD_LIBRARY_PATH for the libXrdMacaroons.so and all the xrootd libs which are loaded when starting by the daemon and searched in the usual locations. On the other hand, e.g. libEosAccSciTokens.so is already in /usr/lib64/ by default since everything that we install from eos-server goes there.
And in storage.yaml:
From:
http.cadir: /etc/grid-security/certificates/
to
xrd.tls: /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
xrd.tlsca: certdir /etc/grid-security/certificates/
Run puppet and then we disable puppet from running:
puppet agent -tv
puppet agent --disable 'MGM upgrade to EOS 5: avoiding removal of future eos-quarkdb package after upgrade to EOS 5'
Remove few obsolete packaged replaced newly by eos dependencies automatically (this also prevents to pull xrootd4 packages for upgrades from epel which we do not want, the versionlock for xrootd packages can be removed entirely from our manifests later with the caviat of checking the xrootd path for all use-cases, for example for FED functionality of CMS one needs to update the xrootd binary location in the systemd script).
This is where the instance availability gets affected:
yum remove xrdhttpvoms
yum remove eos-scitokens
Upgrade scitokens-cpp package (will be having strict dependency in EOS releases > 5.0.19 where this shoudl not be necessary to be done explicitly):
yum upgrade scitokens-cpp
Check that eos-quarkdb gets installed based on dependencies resolved in the last command:
yum upgrade "eos-*" "xrootd-*"
Update puppet manifest again from:
hg_eos::private::ns_compact::eos5enabled: false
to
hg_eos::private::ns_compact::eos5enabled: true
And run puppet:
puppet agent --enable
puppet agent -tv
Check the service status and other usual checks
systemctl status eos@*
systemctl status xrootd@quarkdb
rpm -qa | grep eos
rpm -qa | grep xroot
One needs to run yum reinstall eos-grpc on all headnodes and FSTs before proceeding with the usual proceedure.