Alerting thresholds and exceptions management

The Metricator for Nmon application implements a multi-layer alerting thresholds and exceptions system that allows to configure and perform different types of configuration effectively:

threshold_management.png

This system relies on different thresholds and exceptions KVstore collections and lookup tables.

As well, the alerting searches operate a precedence against the different layers, such that:

  • macro default have the lowest precedence

  • template threshold settings override macro default but have a lowest precedence than per server thresholds

  • per server threshold settings have the highest precedence

KVstore and lookups

Bellow is the list of KVstore collections, lookup tables and their purpose:

KVstore collection

Lookup table

Purpose

kv_nmon_alerting_threshold_template

nmon_alerting_threshold_template

threshold frameID template management

kv_nmon_alerting_threshold

nmon_alerting_threshold

per server threshold management

kv_nmon_alerting_threshold_template_filesystem

nmon_alerting_threshold_template_filesystem

threshold frameID template for file-systems

kv_nmon_alerting_threshold_filesystem

nmon_alerting_threshold_filesystem

per server threshold for file-systems

kv_nmon_alerting_filesystem_global_exclusion

nmon_alerting_filesystem_global_exclusion

global file-system exclusions

kv_nmon_alerting_filesystem_template_exclusion

nmon_alerting_filesystem_template_exclusion

frameID template file-system exclusions

kv_nmon_alerting_filesystem_per_server_exclusion

nmon_alerting_filesystem_per_server_exclusion

per file-system exclusion

Template threshold management

Template thresholds and exceptions are configuration that apply to any server matching the frameID configuration.

TEMPLATE alerting threshold for cpu & memory

Menu ALERT CENTER / Manage_template_alerting_threshold (TEMPLATE alerting threshold for cpu & memory)

threshold_template.png

This human interface allows you to:

  • Add, remove or delete any template threshold entry

  • Modify values for each specific threshold parameter

As a result of this configuration, any server matching the frameID will use the threshold configuration, unless the server has per server thresholds configured.

TEMPLATE alerting threshold for file-systems

Menu ALERT CENTER / Manage_template_alerting_threshold_filesystem (TEMPLATE alerting threshold for file-systems)

threshold_template2.png

This human interface allows you to:

  • Add, remove or delete any template threshold entry per file-system / frameID

  • Modify values for each specific threshold parameter

Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.

Shared file-systems

Hint

Handling a shared file-system

  • In some cases, you may have a file-system (NFS shares for instance) that is shared from a specific machine to many other servers.

  • By default, if this file-system reaches the threshold usage level, this can lead to the generation of many alerts, one per server reporting Nmon metrics.

  • You can tag a file-system as shared by creating a record in the template threshold KVstore

  • By doing such, a unique UUID is created which identifies the file-system, only one alert can be created per file-system UUID

  • Non shared file-system have a unique UUID that corresponds to the value of frameID / host / mount, and would result in one unique alert per entity

You can use the following UI driven configuration to handle this use case:

  • Access to the UI “ALERT CENTER / Threshold configuration / TEMPLATE alerting threshold for file-systems”

  • Create a new entry, in frameID enter “*”, set the mount value, threshold option and set “this file-system is shared” to True

shared-file-system.png

Per server threshold management

Per server thresholds have the highest precedence and will override any other settings.

SERVER alerting threshold for cpu & memory

Menu ALERT CENTER / Manage_alerting_threshold (SERVER alerting threshold for cpu & memory)

threshold_server.png

This human interface allows you to:

  • Add, remove or delete any server specific threshold entry

  • Modify values for each specific threshold parameter

As a result of this configuration, the alerting searches will automatically use these thresholds as per server has the highest precedence.

SERVER alerting threshold for file-systems

Menu ALERT CENTER / Manage_alerting_threshold_filesystem (SERVER alerting threshold for file-systems)

threshold_server2.png

This human interface allows you to:

  • Add, remove or delete any server specific threshold entry per server / file-system

  • Modify values for each specific threshold parameter

As a result of this configuration, the alerting searches will automatically use these thresholds as per server has the highest precedence.

Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.

Exclusions management

There are different levels that can be used to configure file-systems exclusions:

  • Global exclusions: the file-systems will be excluded from any alerting

  • Template exclusions: applied on a per frameID basis

  • Server exclusions: applied on a per server basis

file-systems exclusions have no precedence notion, and any matching file-system will be excluded from automatic alerting.

Notes: file-system mount match is case insensitive and wildcards can be used to match multiple mount points at once.

Global exclusions

Menu ALERT CENTER / Manage_file_systems_global_exclusion (GLOBAL alerting exclusion for file-systems)

global_exclusions.png

Template exclusions

Menu ALERT CENTER / Manage_file_systems_template_exclusion (TEMPLATE alerting exclusion for file-systems)

template_exclusions.png

Server exclusions

Menu ALERT CENTER / Manage_file_systems_exclusion (SERVER alerting exclusion for file-systems)

server_exclusions.png