############################################# Alerting thresholds and exceptions management ############################################# **The Metricator for Nmon application implements a multi-layer alerting thresholds and exceptions system that allows to configure and perform different types of configuration effectively:** .. image:: img/threshold_management.png :alt: threshold_management.png :align: center This system relies on different thresholds and exceptions KVstore collections and lookup tables. **As well, the alerting searches operate a precedence against the different layers, such that:** - macro default have the lowest precedence - template threshold settings override macro default but have a lowest precedence than per server thresholds - per server threshold settings have the highest precedence =================== KVstore and lookups =================== **Bellow is the list of KVstore collections, lookup tables and their purpose:** +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | KVstore collection | Lookup table | Purpose | +==================================================+===============================================+=============================================+ | kv_nmon_alerting_threshold_template | nmon_alerting_threshold_template | threshold frameID template management | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_threshold | nmon_alerting_threshold | per server threshold management | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_threshold_template_filesystem | nmon_alerting_threshold_template_filesystem | threshold frameID template for file-systems | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_threshold_filesystem | nmon_alerting_threshold_filesystem | per server threshold for file-systems | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_filesystem_global_exclusion | nmon_alerting_filesystem_global_exclusion | global file-system exclusions | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_filesystem_template_exclusion | nmon_alerting_filesystem_template_exclusion | frameID template file-system exclusions | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ | kv_nmon_alerting_filesystem_per_server_exclusion | nmon_alerting_filesystem_per_server_exclusion | per file-system exclusion | +--------------------------------------------------+-----------------------------------------------+---------------------------------------------+ ============================= Template threshold management ============================= **Template thresholds and exceptions are configuration that apply to any server matching the frameID configuration.** TEMPLATE alerting threshold for cpu & memory -------------------------------------------- **Menu ALERT CENTER / Manage_template_alerting_threshold (TEMPLATE alerting threshold for cpu & memory)** .. image:: img/threshold_template.png :alt: threshold_template.png :align: center :width: 1200px :class: with-border **This human interface allows you to:** - Add, remove or delete any template threshold entry - Modify values for each specific threshold parameter **As a result of this configuration, any server matching the frameID will use the threshold configuration, unless the server has per server thresholds configured.** TEMPLATE alerting threshold for file-systems -------------------------------------------- **Menu ALERT CENTER / Manage_template_alerting_threshold_filesystem (TEMPLATE alerting threshold for file-systems)** .. image:: img/threshold_template2.png :alt: threshold_template2.png :align: center :width: 1200px :class: with-border **This human interface allows you to:** - Add, remove or delete any template threshold entry per file-system / frameID - Modify values for each specific threshold parameter **Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.** Shared file-systems """"""""""""""""""" .. hint:: Handling a shared file-system - In some cases, you may have a file-system (NFS shares for instance) that is shared from a specific machine to many other servers. - By default, if this file-system reaches the threshold usage level, this can lead to the generation of many alerts, one per server reporting Nmon metrics. - You can tag a file-system as shared by creating a record in the template threshold KVstore - By doing such, a unique UUID is created which identifies the file-system, only one alert can be created per file-system UUID - Non shared file-system have a unique UUID that corresponds to the value of frameID / host / mount, and would result in one unique alert per entity **You can use the following UI driven configuration to handle this use case:** - Access to the UI "ALERT CENTER / Threshold configuration / TEMPLATE alerting threshold for file-systems" - Create a new entry, in frameID enter "*", set the mount value, threshold option and set "this file-system is shared" to True .. image:: img/shared-file-system.png :alt: shared-file-system.png :align: center :width: 1200px :class: with-border =============================== Per server threshold management =============================== **Per server thresholds have the highest precedence and will override any other settings.** SERVER alerting threshold for cpu & memory ------------------------------------------ **Menu ALERT CENTER / Manage_alerting_threshold (SERVER alerting threshold for cpu & memory)** .. image:: img/threshold_server.png :alt: threshold_server.png :align: center :width: 1200px :class: with-border **This human interface allows you to:** - Add, remove or delete any server specific threshold entry - Modify values for each specific threshold parameter **As a result of this configuration, the alerting searches will automatically use these thresholds as per server has the highest precedence.** SERVER alerting threshold for file-systems ------------------------------------------ **Menu ALERT CENTER / Manage_alerting_threshold_filesystem (SERVER alerting threshold for file-systems)** .. image:: img/threshold_server2.png :alt: threshold_server2.png :align: center :width: 1200px :class: with-border **This human interface allows you to:** - Add, remove or delete any server specific threshold entry per server / file-system - Modify values for each specific threshold parameter **As a result of this configuration, the alerting searches will automatically use these thresholds as per server has the highest precedence.** **Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.** ===================== Exclusions management ===================== **There are different levels that can be used to configure file-systems exclusions:** - Global exclusions: the file-systems will be excluded from any alerting - Template exclusions: applied on a per frameID basis - Server exclusions: applied on a per server basis **file-systems exclusions have no precedence notion, and any matching file-system will be excluded from automatic alerting.** **Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.** Global exclusions ----------------- **Menu ALERT CENTER / Manage_file_systems_global_exclusion (GLOBAL alerting exclusion for file-systems)** .. image:: img/global_exclusions.png :alt: global_exclusions.png :align: center :width: 1200px :class: with-border Template exclusions ------------------- **Menu ALERT CENTER / Manage_file_systems_template_exclusion (TEMPLATE alerting exclusion for file-systems)** .. image:: img/template_exclusions.png :alt: template_exclusions.png :align: center :width: 1200px :class: with-border Server exclusions ----------------- **Menu ALERT CENTER / Manage_file_systems_exclusion (SERVER alerting exclusion for file-systems)** .. image:: img/server_exclusions.png :alt: server_exclusions.png :align: center :width: 1200px :class: with-border