.. highlight:: rst .. index:: single: Group Balancer Group Balancer ============================== The group balancer uses the :doc:`converter` mechanism to move files from groups above a given threshold filling state to groups under the threshold filling state. Once the groups fall within the threshold they no longer participate in balancing and thus prevents further oscillations, once the groups are in a settled state. Group Balancer Engine --------------------- From EOS 4.8.74 2 different balancer engines are supported which can be switched at runtime. A brief description of the various engines and their features are described below. Please note that only one engine can be configured to run at a time. Std ~~~ This is the default engine, which uses deviation from the average groups filled to decide which groups are the outliers to be balanced. Both the deviation from the left and right can be configured individually to further fine tune how the groups are picked for balancing. The parameter is to be entered as percent value as deviation from average. Groups within the threshold values will not participate in balancing. Files from groups above the threshold will be picked at random within constraints (see `min/max_file_size` config below) and moved to groups below threshold. The parameters expected for the engine are `max_threshold` and `min_threshold`, groups above max_threshold deviation from average and below min_threshold deviation from average will be the participating groups. For compatibility the currently ``groupbalancer.threshold`` will be as a default value in case both ``groupbalancer.min_threshold`` and ``groupbalancer.max_threshold`` aren't provided. It is recommended to explicitly configure as this option may be removed in a future release. MinMax ~~~~~~ This engine can be used as a stop gap engine to balance outliers, unlike the std. engine no averages are computed, this engine takes static min & max threshold values which are absolute `%` of groups fill ratio. Groups with usage above the `max_threshold` (for eg 90%) will be chosen for filling to groups with usage below `min_threshold`. While for almost all common use cases std. engine should fit the bill, when needing to do targetted balancing only on certain outliers this engine can be used as a temporary measure. This engine is only recommended as a quick fix to balance outliers and then it is recommended to run the std. engine to balance for longer periods of time. Configuration ------------- Groupbalancing is enabled/disabled by space: .. code-block:: bash # enable eos space config default space.groupbalancer=on # disable eos space config default space.groupbalancer=off The current configuration of Group Balancing can be seen via .. code-block:: bash eos -b space status default # ------------------------------------------------------------------------------------ # Space Variables # .................................................................................... ... groupbalancer := on groupbalancer.engine := std groupbalancer.file_attempts := 50 groupbalancer.max_file_size := 20000000000 groupbalancer.min_file_size := 1000000000 groupbalancer.max_threshold := 5 groupbalancer.min_threshold := 5 groupbalancer.ntx := 1500 groupbalancer.threshold := 1 # Deprecated, this value will not be used if min/max thresholds are set ... The ``max_file_size`` and ``min_file_size`` parameter decides the size of files to be picked for transfer. The ``file_attempts`` is the number of attempts the random picker will use to try to find a file within those sizes. For really sparse file systems, where the probability of finding a file within the size might be lower, it is possible to tweak this number. The number of concurrent transfers to schedule is defined via the **groupbalancer.ntx** space variable, this is the number of transfers in every cycle of groupbalancer scheduling, which is every 10s. Hence it is recommended to set a min value in the hundreds or around 1000 (and watch the progress occasionally with eos io stat) if the groups are really unbalanced: .. code-block:: bash # schedule 10 transfers in parallel eos space config default space.groupbalancer.ntx=1000 Configure the groupbalancer engine: .. code-block:: bash # configure the goupbalancer engine eos space config default space.groupbalancer.engine=std The threshold in percent is defined via the **groupbalancer.min_threshold** & **groupbalancer.max_threshold** variable. For std. balancer engine this is a percent deviation from average: .. code-block:: bash # set a 3 percent min threshold & 5 percent max threshold eos space config default space.groupbalancer.min_threshold=3 eos space config default space.groupbalancer.max_threshold=5 In case you want to run the minmax balancer engine, here the values are absolute values # set a 3 percent min threshold & 5 percent max threshold eos space config default space.groupbalancer.engine=minmax eos space config default space.groupbalancer.min_threshold=60 eos space config default space.groupbalancer.max_threshold=80 Make sure that you have enabled the converter and the **converter.ntx** space variable is bigger than **groupbalancer.ntx** : .. code-block:: bash # enable the converter eos space config default space.converter=on # run 20 conversion transfers in parallel eos space config default space.converter.ntx=20 One can see the same settings and the number of active conversion transfers (scroll to the right): .. code-block:: bash eos space ls #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ # type # name # groupsize # groupmod #N(fs) #N(fs-rw) #sum(usedbytes) #sum(capacity) #capacity(rw) #nom.capacity #quota #balancing # threshold # converter # ntx # active #intergroup #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ spaceview default 22 22 202 123 2.91 T 339.38 T 245.53 T 0.00 on off 0.00 on 100.00 0.00 off Status ------ Status of the groupbalancer engine can be viewed with .. code-block:: bash $ eos space groupbalancer status default Engine configured : Std Current Computed Average : 0.397366 Min Deviation Threshold : 0.03 Max Deviation Threshold : 0.05 Total Group Size: 25 Total Groups Over Threshold: 8 Total Groups Under Threshold: 12 # Detailed view of groups available with `--detail` switch $ eos space groupbalancer status default --detail engine configured : Std Current Computed Average : 0.397258 Min Deviation Threshold : 0.03 Max Deviation Threshold : 0.05 Total Group Size: 25 Total Groups Over Threshold: 8 Total Groups Under Threshold: 12 Groups Over Threshold ┌──────────┬──────────┬──────────┬──────────┠│Group │ UsedBytes│ Capacity│ Filled│ ├──────────┴──────────┴──────────┴──────────┤ │default.8 2.75 T 6.00 T 0.46│ │default.6 5.34 T 6.00 T 0.89│ │default.5 2.78 T 6.00 T 0.46│ │default.12 2.74 T 6.00 T 0.46│ │default.11 2.77 T 6.00 T 0.46│ │default.10 2.74 T 6.00 T 0.46│ │default.3 2.83 T 6.00 T 0.47│ │default.0 5.36 T 6.00 T 0.89│ └───────────────────────────────────────────┘ Groups Under Threshold ┌──────────┬──────────┬──────────┬──────────┠│Group │ UsedBytes│ Capacity│ Filled│ ├──────────┴──────────┴──────────┴──────────┤ │default.9 2.19 T 6.00 T 0.36│ │default.7 2.18 T 6.00 T 0.36│ │default.24 1.78 T 6.00 T 0.30│ │default.21 2.20 T 6.00 T 0.37│ │default.2 1.47 G 6.00 T 0.00│ │default.18 1.86 T 6.00 T 0.31│ │default.17 2.17 T 6.00 T 0.36│ │default.20 1.81 T 6.00 T 0.30│ │default.15 1.80 T 6.00 T 0.30│ │default.14 6.10 G 6.00 T 0.00│ │default.13 2.15 T 6.00 T 0.36│ │default.1 1.75 T 6.00 T 0.29│ └───────────────────────────────────────────┘ For MinMax engines these numbers are absolute percent (for eg this was configured with 45 & 85) .. code-block:: bash $ eos space groupbalancer status default Engine configured: MinMax Min Threshold : 0.45 Max Threshold : 0.85 Total Group Size: 25 Total Groups Over Threshold: 9 Total Groups Under Threshold: 4 There is a 60s cache for values, so if values are reconfigured Traffic from the groupbalancer is tagged as ``eos/groupbalancer`` and visible in iostat .. code-block:: bash eos io stat -x io │ application│ 1min│ 5min│ 1h│ 24h └───┴────────────────────────┴────────┴────────┴────────┴────────┘ out eos/groupbalancer 86.41 G 190.89 G 2.95 T 19.15 T out eos/replication 0 1.49 G 52.96 G 52.96 G out other 605 1.33 K 10.77 K 64.73 K in eos/groupbalancer 18.91 G 85.30 G 2.83 T 19.04 T in eos/replication 0 1.43 G 52.90 G 52.90 G in other 605 1.33 K 10.77 K 64.73 K Log Files --------- The Group Balancer has a dedicated log file under ``/var/log/eos/mgm/GroupBalancer.log`` which shows basic variables used for balancing decisions and scheduled transfers. To get more verbose information you can change the log level: .. code-block:: bash # switch to debug log level on the MGM eos debug debug # switch back to info log level on the MGM eos debug info