.. highlight:: rst

.. index::
   single: Balancing System

Balancing System
================

Overview
--------

The EOS balancing system provides a fully automated mechanism to balance the 
volume usage across a scheduling group. Hence currently the balancing system 
does not balance between scheduling groups! See :doc:`groupbalancer`!

The balancing system is made up by the cooperation of several components:

* Central File System View with file system usage information and space configuration
* Centrally running balancer thread steering the filesystem balancer process by computing averages and deviations
* Balancer Thread on each FST pulling workload to pull files locally to balance filesystems

.. ::note

   Balancing is en-/disabled in each space seperatly!

Balancing View and Configuration
--------------------------------

Each filesystem advertises the used volume and the central view allows to see 
the deviation from the average filesystem usage in each group.

.. code-block:: bash

   EOS Console [root://localhost] |/> group ls
   #---------------------------------------------------------------------------------------------------------------------
   #     type #           name #     status #nofs #dev(filled) #avg(filled) #sig(filled) #balancing #  bal-run #drain-run
   #---------------------------------------------------------------------------------------------------------------------
   groupview  default.0                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.1                  on     8         0.28         0.10         0.12 idle                0          0
   groupview  default.10                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.11                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.12                 on     7         0.28         0.11         0.14 idle                0          0
   groupview  default.13                 on     8         0.28         0.12         0.14 idle                0          0
   groupview  default.14                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.15                 on     8         0.30         0.10         0.13 idle                0          0
   groupview  default.16                 on     7         0.26         0.12         0.13 idle                0          0
   groupview  default.17                 on     8         0.28         0.12         0.14 idle                0          0
   groupview  default.18                 on     8         0.30         0.10         0.14 idle                0          0
   groupview  default.19                 on     8        12.42         4.76         6.80 idle                0          0
   groupview  default.2                  on     8         0.48         0.16         0.23 idle                0          0
   groupview  default.20                 on     8        14.03         5.43         7.62 idle                0          0
   groupview  default.21                 on     8         0.48         0.16         0.23 idle                0          0
   groupview  default.3                  on     8         0.28         0.10         0.12 idle                0          0
   groupview  default.4                  on     8         0.26         0.11         0.13 idle                0          0
   groupview  default.5                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.6                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.7                  on     8         0.27         0.09         0.12 idle                0          0
   groupview  default.8                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.9                  on     8         0.30         0.11         0.14 idle                0          0


The decision parameters to enable balancing in a group is the maximum deviation 
of the filling state (given in %). 
In this example two groups are unbalanced (12 + 14 %).

The balancing is configured on the space level and the current configuration 
is displayed using the 'space status' command:

.. code-block:: bash

   EOS Console [root://localhost] |/> space status default
   # ------------------------------------------------------------------------------------
   # Space Variables
   # ....................................................................................
   balancer                         := off
   balancer.node.ntx                := 10
   balancer.node.rate               := 10
   balancer.threshold               := 1
   ...

The configuration variables are:

.. epigraph::
  
   ========================= ======================================================================
   variable                  definition
   ========================= ======================================================================
   balancer                  can be off or on to disable or enable the balancing
   balancer.node.ntx         number of parallel balancer transfers running on each FST
   balancer.node.rate        rate limitation for each running balancer transfer in MB/s
   balancer.threshold        percentage at which balancing get's enabled within a scheduling group
   ========================= ======================================================================
 
If balancing is enabled ....

.. code-block:: bash

   EOS Console [root://localhost] |/> space config default space.balancer=on
   success: balancer is enabled!

Groups which are balancing are shown via the **eos group ls** command:

.. code-block:: bash

   EOS Console [root://localhost] |/> group ls
   #---------------------------------------------------------------------------------------------------------------------
   #     type #           name #     status #nofs #dev(filled) #avg(filled) #sig(filled) #balancing #  bal-run #drain-run
   #---------------------------------------------------------------------------------------------------------------------
   groupview  default.0                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.1                  on     8         0.28         0.10         0.12 idle                0          0
   groupview  default.10                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.11                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.12                 on     7         0.28         0.11         0.14 idle                0          0
   groupview  default.13                 on     8         0.28         0.12         0.14 idle                0          0
   groupview  default.14                 on     8         0.29         0.10         0.13 idle                0          0
   groupview  default.15                 on     8         0.30         0.10         0.13 idle                0          0
   groupview  default.16                 on     7         0.26         0.12         0.13 idle                0          0
   groupview  default.17                 on     8         0.28         0.12         0.14 idle                0          0
   groupview  default.18                 on     8         0.30         0.10         0.14 idle                0          0
   groupview  default.19                 on     8        12.42         4.76         6.80 balancing          10          0
   groupview  default.2                  on     8         0.48         0.16         0.23 idle                0          0
   groupview  default.20                 on     8        14.03         5.43         7.62 balancing          12          0
   groupview  default.21                 on     8         0.48         0.16         0.23 idle                0          0
   groupview  default.3                  on     8         0.28         0.10         0.12 idle                0          0
   groupview  default.4                  on     8         0.26         0.11         0.13 idle                0          0
   groupview  default.5                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.6                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.7                  on     8         0.27         0.09         0.12 idle                0          0
   groupview  default.8                  on     8         0.27         0.10         0.12 idle                0          0
   groupview  default.9                  on     8         0.30         0.11         0.14 idle                0          0

The current balancing can also be viewed by space or node:

.. code-block:: bash

   EOS Console [root://localhost] |/> space ls --io
   #----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   #     name # diskload # diskr-MB/s # diskw-MB/s #eth-MiB/s # ethi-MiB # etho-MiB #ropen #wopen # used-bytes #  max-bytes # used-files # max-files #  bal-run #drain-run
   #----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   default       0.02        66.00        66.00        862         57         60     31     22      1.99 TB    347.33 TB     805.26 k     16.97 G         51          0

   EOS Console [root://localhost] |/> node ls --io
   #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   #               hostport # diskload # diskr-MB/s # diskw-MB/s #eth-MiB/s # ethi-MiB # etho-MiB #ropen #wopen # used-bytes #  max-bytes # used-files # max-files #  bal-run #drain-run
   #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   lxfsra02a02.cern.ch:1095       0.08        41.00         0.00        119          0         41     23      0    825.47 GB     41.92 TB     298.80 k      2.05 G          0          0
   lxfsra02a05.cern.ch:1095       0.03        19.00         0.00        119          0         19      2      0    832.01 GB     43.92 TB     152.14 k      2.15 G          0          0
   lxfsra02a06.cern.ch:1095       0.01         0.00        11.00        119         12          0      0      6     70.05 GB     43.92 TB      54.77 k      2.15 G         10          0
   lxfsra02a07.cern.ch:1095       0.01         0.00        11.00        119          9          0      0      3     79.95 GB     43.92 TB      75.91 k      2.15 G         10          0
   lxfsra02a08.cern.ch:1095       0.01         0.00        11.00        119          9          0      0      2     52.01 GB     43.92 TB      61.25 k      2.15 G          8          0
   lxfsra04a01.cern.ch:1095       0.01         0.00        10.00        119          9          0      0      1     72.12 GB     41.92 TB      60.92 k      2.05 G          8          0
   lxfsra04a02.cern.ch:1095       0.01         0.00        10.00        119          9          0      0      7     52.32 GB     43.92 TB      86.72 k      2.15 G         10          0
   lxfsra04a03.cern.ch:1095       0.01         0.00        10.00        119          9          0      0      5     10.53 GB     43.92 TB      14.80 k      2.15 G          5          0

To see the usage difference within the group, one can inspect all the group filesystems via **eos group ls --IO** e.g.

.. code-block:: bash

   EOS Console [root://localhost] |/> group ls --IO default.20
   #---------------------------------------------------------------------------------------------------------------------
   #     type #           name #     status #nofs #dev(filled) #avg(filled) #sig(filled) #balancing #  bal-run #drain-run
   #---------------------------------------------------------------------------------------------------------------------
   groupview  default.20                 on     8        13.71         5.48         7.47 balancing          37          0
   #.................................................................................................................................................................................................................
   #                     hostport #  id #     schedgroup # diskload # diskr-MB/s # diskw-MB/s #eth-MiB/s # ethi-MiB # etho-MiB #ropen #wopen # used-bytes #  max-bytes # used-files # max-files #  bal-run #drain-run
   #.................................................................................................................................................................................................................
   lxfsra02a05.cern.ch:1095    17       default.20       0.47        12.00         0.00        119          0         21      1      0    383.17 GB      2.00 TB      59.33 k     97.52 M          0          0
   lxfsra02a06.cern.ch:1095    35       default.20       0.08         0.00         6.00        119         10          0      0      6     26.56 GB      2.00 TB       6.23 k     97.52 M          7          0
   lxfsra04a01.cern.ch:1095    57       default.20       0.13         0.00         6.00        119          9          0      0      4     25.01 GB      2.00 TB       6.11 k     97.52 M          4          0
   lxfsra02a08.cern.ch:1095    77       default.20       0.08         0.00         6.00        119         11          0      0      5     27.36 GB      2.00 TB       6.64 k     97.52 M          8          0
   lxfsra04a02.cern.ch:1095    99       default.20       0.07         0.00         4.00        119         10          0      0      3     26.57 GB      2.00 TB       7.75 k     97.52 M          6          0
   lxfsra02a02.cern.ch:1095   121       default.20       1.00        22.00         0.00        119          0         41     21      0    351.07 GB      2.00 TB      59.80 k     97.52 M          0          0
   lxfsra02a07.cern.ch:1095   143       default.20       0.10         0.00         7.00        119          9          0      0      2     28.57 GB      2.00 TB       7.46 k     97.52 M          7          0
   lxfsra04a03.cern.ch:1095   165       default.20       0.12         0.00         6.00        119         10          0      0      5      7.56 GB      2.00 TB       2.96 k     97.52 M          5          0

 
The scheduling activity for balancing can be monitored with the **eos ns ls** command:

.. code-block:: bash

   EOS Console [root://localhost] |/> ns stat
   # ------------------------------------------------------------------------------------
   # Namespace Statistic
   # ------------------------------------------------------------------------------------
   ALL      Files                            682781 [booted] (12s)
   ALL      Directories                      1316
   # ....................................................................................
   ALL      File Changelog Size              804.27 MB
   ALL      Dir  Changelog Size              515.98 kB
   # ....................................................................................
   ALL      avg. File Entry Size             1.18 kB
   ALL      avg. Dir  Entry Size             392.00 B
   # ------------------------------------------------------------------------------------
   ALL      Execution Time                   0.40 +- 1.12
   # -----------------------------------------------------------------------------------------------------------
   who      command                          sum             5s     1min     5min       1h exec(ms) +- sigma(ms)
   # -----------------------------------------------------------------------------------------------------------
   ALL        Access                                      0     0.00     0.00     0.00     0.00     -NA- +- -NA-     
    ....
   ALL        Schedule2Balance                         6423    11.75    10.81    10.71     1.78     -NA- +- -NA-     
   ALL        Schedule2Drain                              0     0.00     0.00     0.00     0.00     -NA- +- -NA-     
   ALL        Scheduled2Balance                        6423    11.75    10.81    10.71     1.78     4.20 +- 0.57 
   ALL        SchedulingFailedBalance                     0     0.00     0.00     0.00     0.00     -NA- +- -NA-

   
The relevant counters are:

.. epigraph::
   
   ============================== =====================================================================
   state                          definition
   ============================== =====================================================================
   Schedule2Balance               counter/rate at which all FSTs ask for a file to balance
   ScheduledBalance               counter/rate of balancing transfers which have been scheduled to FSTs
   SchedulingFailedBalance        counter/rate of scheduling requests which could not get any workload
                                  (e.g. no file matches the target machine)
   ============================== =====================================================================