Alerting Linux Monitoring Performance Splunk Unix

Nmon Performance for Splunk VERSUS Splunk app for unix and TA-unix

As a fellow Splunker since many years now, I had the chance to develop and maintain the Nmon Performance application for Splunk, with the goal to get the best features and user experience to provide a strong and complete monitoring solution for your Unix and Linux servers.

In this article, I will compare (as much objectively as possible!) the Nmon application versus the legacy Splunk application for Unix monitoring, Splunk app for Unix.



  • Official Splunk application for performance metric collection of Unix and Linux servers
  • URL:
  • Splunk built
  • Splunk supported for customers under license
  • community support

System compatibility:

  • Any Unix base system: AIX, Solaris, Linux, Mac OS X…

Splunk compatibility notes:

  • Splunk 5.x / 6.x for the technical addon
  • Splunk 6.x for search heads, indexers

Indexing volume (licence cost per server):

  • Approx. 200 mb per day / per host (including log files)
  • Approx. 100 mb per day / per host (only for performance and inventory metrics)

Data sources:

  • Various performance monitors through multiple input scripts
  • Changes to files in “/var/log” and “/etc”


Data source can be activated / deactivated on a per data source basis. Time between measures can be increased to lower the volume of data to be generated.



  • Open source and community application for Splunk, collects and analyse performance metrics
  • URL:
  • community built
  • not supported by Splunk
  • community support for free users and company support for users under support contracts with Octamis !

System compatibility:

  • AIX
  • Solaris (sparc, x86)
  • Linux (any distribution, any processor architecture including x86 32 and 64 bits, PowerPC, ARM…)

Splunk compatibility notes:

  • Splunk 6.x for the technical addon
  • Splunk 6.x for Splunk nodes

Indexing volume (licence cost per server):

  • Approx. 20 mb per day / per host

Data sources:

  • 1 unique data source produced by the nmon binary, contains a large and various number of system key monitors
  • Along with the performance monitors, Nmon produces and ingests on a regular basis the full server configuration
  • Nmon has also specific monitors per system, as for an example Nmon is able to generate performance metrics of IBM frames, Solaris zones, etc…


The volume and the content of the data can be centralised through a unique configuration file (nmon.conf) that allows to manage the interval between performance measures. Various options such as nmon binary startup arguments are available and customisation is available through the nmon.conf file.



Data generation matrix:

Item Splunk app for *nix NMON Performance
Performance metrics yes yes
System inventory data generation yes yes
System logs ingestion yes no
Security logs ingestion yes no

Performance metrics matrix:

Item Splunk app for *nix NMON Performance
CPU usage metrics Yes
(limited features for AIX with no support for LPAR stats, limited for PowerLinux, limited for Solaris)


(with support for AIX LPAR, PowerLinux, Solaris zones)sourcetype=nmon_data type=CPU* OR type-LPAR OR type=WLMCPU*
Memory usage metrics Yes (but limited to vmstat reporting)


Yes (with additional advanced memory metrics per Operating Systems)
sourcetype=nmon_data type=MEM* OR type=WLMMEM*
Processes statistics Yes (these statistics have a large impact in the volume of data generated per server)
sourcetype=top OR sourcetype=ps
Yes (reports by default processes consuming more than 0.1% of a cpu, which is configurable)

sourcetype=nmon_data type=TOP OR type=UARG

File systems performance statistics Yes (limited to iostat reporting)


Yes (extended to various disks statistics and per Operating System specific disks statistics)
sourcetype=nmon_data type=DISK*
File systems usage statistics Yes


Yes (has limitation to % usage only , but inventory data reports on volume other statistics)
sourcetype=nmon_data type=JFS*
Network statistics Yes
sourcetype=interfaces OR sourcetype=protocol
sourcetype=nmon_data type=NET*


This list is not exhaustive, additional metrics are reported by both applications.

As well, The Nmon application implements the “nmon external” feature which allows adding non native nmon metrics in the nmon processing.


Configuration data reporting matrix:

Item Splunk app for *nix NMON Performance
Reports main system configuration items (CPU, memory and disks configuration…) Yes







Splunk deployment considerations:

Item Splunk app for *nix NMON Performance
Distributed deployment compatibility Yes Yes
Approx volume per day / per server for performance and inventory data only (see Volume of data and licensing costs) Approx. 100 MB / day Approx 20 MB / day
Large deployment at scale Yes Yes
Common Information Model compliance Yes Yes

Advanced built in features :

Item Splunk app for *nix NMON Performance
Provides coherent user interfaces for system analysis and Capacity planning management No Yes
Provides user interfaces to Machine Learning and other advanced analysis No Yes
Provides user interfaces for complex comparison No Yes


Operating systems considerations:

Item Splunk app for *nix NMON Performance
IBM AIX Yes but very limited (no management of IBM specific metrics such as LPAR statistics) Yes
ORACLE SOLARIS (x86 / sparc) Yes but very limited (no management of Solaris specific metrics such as virtual zones statistics) Yes
LINUX (all processors) Yes Yes
Relies on dependencies Yes (requires the package sysstat) Yes
Linux: binaries for most distribution and architecture are embedded.If no suitable binaries can be found, the addon will try to use the one available in PATH.This feature can be controlled by configuration, and give the priority to locally available nmon binaries.
Solaris: binaries are embedded for x86 and sparc
AIX: nmon is now included by default as an official IBM package (topas-nmon), the addon will use the binary available in path
Interpreter: For various processing tasks, the addon relies by default on Python 2.7.x.
If not available, it will fallback to Perl interpreter.If such, it requires the Perl module “Time/HiRes”.(no specific Perl version required, untested with version prior to Perl v5)
Privilege / permissions restrictions: can run as unprivileged user ? Yes, but privilege restrictions generate metrics to be unavailable on some systems Yes, with no restrictions.



Splunk app for *nix exposes the following main performance monitors:


Nmon Performance exposes a large number of performance monitors, as well as configuration data:

Some of these monitors are specific per system and architectures, such as IBM frames monitoring (for AIX, PowerLinux) and Solaris zone statistics.

main_type sourcetype perf_item Nbr_items
Application Internal Processing sourcetype=nmon_clean APPLICATION INTERNAL 1
Application Internal Processing sourcetype=nmon_collect APPLICATION INTERNAL 1
Application Internal Processing sourcetype=nmon_processing APPLICATION INTERNAL 1
Configuration Data sourcetype=nmon_config CONFIGURATION DATA 2
Performance Data sourcetype=nmon_data BLOCK STATISTICS 2
Performance Data sourcetype=nmon_data CPU USAGE STATISTICS 53
Performance Data sourcetype=nmon_data DISKS STATISTICS 50
Performance Data sourcetype=nmon_data EXTERNAL COLLECTION 7
Performance Data sourcetype=nmon_data FIBER CHANNEL STATISTICS 13
Performance Data sourcetype=nmon_data FILESYSTEMS STATISTICS 4
Performance Data sourcetype=nmon_data KERNEL STATISTICS 73
Performance Data sourcetype=nmon_data MEMORY STATISTICS 58
Performance Data sourcetype=nmon_data NETWORK NFS 6
Performance Data sourcetype=nmon_data NETWORK TRAFFIC 12
Performance Data sourcetype=nmon_data PAGING STATISTICS 10
Performance Data sourcetype=nmon_data PROCESSES STATISTICS 68

Note: Nmon Performance app provides a data dictionary (stored in a lookup table) and exploited in the Data dictionary view:





A basic Linux system will have about 15 types of performance metric, each type of metrics can contain dozens of performance monitors.


Data structure comparison:

The Splunk app for *nix ingests the output of various script, such as the “” script that generates CPU usage statistics:

This is basically unstructured data, and all the fields (but Metadata) are extracted at search time.

On the other side, Nmon Performance generates by default csv structured data with a common schema to all the data to be generated.


CSV ingested data is being indexed as structured data and fields are indexed fields, this provides an high level of performance with a very low level of licensing volume.

SPL searches over the nmon data are about between 2x and 10x faster than Splunk for *nix data, large scale deployment will get much better performances with Nmon data. (this is an estimation which relies on the type of searches, the volume of data and many other factors)

Finally, the application provides as well different optional modes for the data generation. (json extracted, json indexed and syslog like)

Licensing and data volume:

Splunk app for *nix contains several sourcetypes that are out of the scope of the comparison purpose. (security related, files configuration changes…)

Nmon Performance focuses on pure performance metrics and inventory data.

The following sourcetypes will be considered as comparable to the Nmon data:

  • cpu: CPU state information
  • df: Information on available disk space on mounted volumes
  • hardware: Information on hardware specification
  • interfaces: Information on network interfaces on the system
  • iostat: Information on Input/Output operations
  • netstat: The state of the network (open/listening ports, connections, etc.) on a host
  • ps: Information on processes
  • time: Information about the time service
  • top: Output from the *nix top command
  • vmstat: Information on virtual memory


The nmon data is split into several main “sourcetypes”:

  • nmon_data: Performance metrics data ordered by the key “type” which corresponds to the nmon section metric item (CPU_ALL, LPAR…)
  • nmon_config: Configuration data extracted by nmon2csv converters, corresponds to AAA and BBB* sections of nmon raw data
  • nmon_collect: Output of the script which is responsible for nmon instances launches:
  • nmon_processing: Output of nmon2csv Python and Perl converters (conversion of nmon raw data into csv data):
  • nmon_clean: Output of the script (interface to | which is responsible for nmon raw data file cleaning:

Base on a sample of 4 Linux servers, the following search reports the average hourly cost in MB per server:

index=_internal source=*license_usage.log* type=Usage (idx="os" AND s=cpu OR s=df OR s=hardware OR s=interfaces OR s=iostat OR s=netstat OR s=ps OR s=time OR s=top OR s=vmstat) OR (idx=nmon) 
| where b>0
| bucket _time span=1h
| stats sum(b) as b by _time,idx
| eval volume_MB = round(b/1024/1024,2)
| stats avg(volume_MB) as volume_MB by idx
| eval nb_servers="4"
| eval estimated_volume_MB_per_day=round(((volume_MB*24)/4),2)
| fields idx, estimated_volume_MB_per_day
| transpose
| rename "row 1" as nmon, "row 2" as os
| eval ratio=case(os>nmon, (os/nmon), nmon>os, (nmon/os) )


In the following sample, os (Splunk app for *nix) is approx. 12x the volume of nmon.

Depending on the configuration and the items to be activated, we can expect a ratio between 5x and up to 10x more data to be generated with Splunk app for *nix.

This is mainly due to those Splunk app for *nix items:

  • (sourcetype=top)
  • (sourcetype=ps)

Base on the same testing environment:

index=_internal source=*license_usage.log* type=Usage (idx="os" AND s=cpu OR s=df OR s=hardware OR s=interfaces OR s=iostat OR s=netstat OR s=ps OR s=time OR s=top OR s=vmstat)
| where b>0
| bucket _time span=1h
| stats sum(b) as b by _time,idx,s
| eval volume_MB = round(b/1024/1024,2)
| stats avg(volume_MB) as volume_MB, by idx,s 
| eval nb_servers="4"
| eval estimated_volume_MB_per_day=round(((volume_MB*24)/4),2) 
| fields idx,s,estimated_volume_MB_per_day
| addcoltotals


In the same environment testing, and if we extend the Nmon processes coverage to the unlimited mode (captures all the processes table), the ratio of volume will decrease:

This demonstrates a “price to pay” being like 5.5 higher for Splunk app for *nix versus Nmon Performance.

With Nmon Performance set to unlimited capture, and as a pure theoretical deployment scenario covering 1000 servers (an average large deployment), this would generate approx 16GB of data per day for Nmon Performance, and approx. 88 GB of data per day for Splunk app for *nix.

With the default Nmon Performance processes capture, the same scenario would be about 8GB of data per day for Nmon, and the same value for Splunk app for *nix.

These values are purely theoretical but representatives, in a real life deployment the Nmon Performance volume might be slightly higher on busy servers (and large servers with large disks configuration and processes tables) but this is also through for Splunk app for *nix, the ratio will still much higher and the price to pay with Nmon much lower.

Additional notes:

The Nmon Performance application provides a “Total Cost Of Ownership” dashboard that can be used to analyse every piece of cost related to the deployment of the application:

The TCO interface reports:

  • Licensing: the average total cost in MB per day / per server
  • Licensing: the average total cost in GB for the global Nmon ingestion
  • Scheduling costs: average number of scheduled searches / 5 min, run time statistics
  • Index storage details: buckets details, compression statistics and performance…
  • Indexing volume over time report
  • Per sourcetype details
  • Per report scheduled costs: run time, average duration, last execution…
  • Nmon processing performance statistics: run time, average size of data processed…



Home page of the Splunk app for *nix:

The home page exposes a “radial” chart, and as well recent Unix alerts (configurable).

The goal of this is to provide a global picture of the servers main metrics and resources utilisation, however the homepage is quite problematic.

Built in advanced xml (deprecated language), the radial chart has a poor technical interest, in real Production life it is very likely to be useless.

The Nmon performance app’s home page is slightly different:

The application home page provides:

  • A full application menu structured by Operating system or category, provides fast access to interfaces, metrics, reports and more.
  • “APPLICATION INFORMATION”: An informational panel that provides main internal key monitors of the app: earliest and latest events, number of servers managed, reported errors…
  • “SAFE CENTER”: This panel summarize the active alerts, the application on main system metrics, CPU, Memory usage…
  • Fast accesses to a rich panel of interfaces, global interfaces such as the NMON Summary and the NMON ANALYSER, on a per metric basis dedicated interfaces and various advanced interfaces providing different features



The Splunk for *nix app provides a unique interface for metric analysis:

In this interface, the user can select one or more hosts, choose the performance metric and get the following “chart”:

This interface does not really provide the required pieces of material to perform a real analysis of system loads.

An other available approach via the “hosts” menu:

This provides some inventory information and as well a big picture of the servers load, however it does not really provide a human interface to performance metrics.


The nmon application provides different human interfaces, on top of the home page the application provides fast access to global interfaces: 

NMON SUMMARY OVERVIEW: (per server interface)

NMON WALL OF PERFORMANCE: (multi-server interface)


NMON DARK MONITORING: (per server interface)


NMON ANALYSER: (per server interface)



Every interface provides at least: (some have specific selection in addition)

  • Time range selection
  • Time filtering options (filtering night statistics for example)
  • OS type filtering
  • FrameIDs filtering (can be use to group hosts in categories for easier selection)
  • Host filtering pattern (to restrict the list of hosts in the hosts selector)
  • Host selector (multi-select)
  • Aggregation as a standard function
  • Stastic mode selector
  • Table statistics
  • Chart statistics with dynamic selection
  • Type of chart
  • Missing data management (gaps, connect…)
  • Stacking mode selector
  • Legend placement
  • span value selector



The Splunk for *nix has embedded alerting features, which can be controlled through the setup page:

As well, the application provides a configurable interface in the application home page that intends to provide a quick overview of the active and historical alert:

Testing alerts:

When alerts have been configured and when an over consumption occurs, this will be visible in the application home page:

Users can click on the alert and get the following interface:

This interface is actually interesting as it provides a quick vision of some system key metrics for the involved by the alert.

However, in a real deployment, alerting will occur frequently and this interface should be accessible without having to raise an alert for it.

A such interface should be available for troubleshooting and analysis purposes out of the scope of system alerting.


By default, the Nmon Performance application embeds different alerts for system key metrics:

  • CPU usage
  • Real memory
  • Virtual memory
  • File system utilization
  • IBM Power frames pools usage

These alerts are slightly different in the meaning that they intend to analyse the duration of the consumption peak, and not only generating an alert when a peak occurs.

In the default configuration, alerts will be activated when the duration of the consumption peak exceeds 5 minutes, this provides smarter alerts that get rid of useless notification. (a quick peak of CPU or memory usage might be something that can really affects services, which is why alerting will focus on duration analysis.)

These alerts are clearly exposes in the application home page “Safe center” panel:

Active drilldown is available for each type of alert:

As well, the application provides a frontend to the alerting feature with the “Safe Center” interface:


In this example case, we have used a system stressing tool “ng-stress” to generate an overhead on a Linux server, as shown above a CPU usage alert has been generated automatically.

Using provided interface, it is only a few seconds to get the system statistics and troubleshoot the situation on this server:

Using the dedicated TOP interface we can even by active drilldown get the PIDs for every command invocation responsible for this overhead: (example with of the command groups)

Which can be correlated with the UARG data which contains full arguments of processes:

All these features comes as a standard of the application, available with absolutely no configuration.

However, alerts definition uses macros with arguments that can be easily customised to change the threshold values, duration required…



As part of both solutions, inventory data are being generated automatically:

  • The Splunk app for Unix generates inventory data but does not provide any dedicated interface other than the host management previously exposed
  • Nmon Performance generates inventory data and provides as well interfaces for its management

Nmon raw configuration data viewer:

The inventory data for Nmon consists in a long multi-line events being indexed in Splunk, it is off course searchable directly in Splunk and as well via the RAW configuration viewer interface:














Nmon inventory interfaces:

Several interfaces to the inventory data are available, bellow is for example the Linux dedicated interface:

The nmon inventory data is being generated on a scheduled basis and stored as lookup table stored into a KVstore:

Finally, different reports are provided:


Host grouping is a feature provided by both application to group servers in a context that make them more easily exploitable, such as grouping specifics servers bases on a business unit, application context, data center or any information that makes sense for you.

Splunk for *nix:

The application allows grouping servers per Categories / Groups:

Which allows selecting hosts by these items in the available interfaces:

And is available as well in raw data searches:


NMON Performance:

The application implements the concept of “frame identifier” under the field name “frameID”.

The frameID field is available for selection in most of the interfaces, it can be used to define a logical container grouping servers, this is being stored into a KVstore based lookup table, and use Splunk mapping lookup feature to enrich the raw data with those information.

An interface is provided to manage the grouping feature directly within Splunk Web:

Once configured:

The frameID grouping operates at search time:

And is available as an optional selector in most interfaces:


Advanced features: Capacity planing, Machine learning, metrics comparison

  • The Splunk app for *nix does not provide any other interface than the unique metric interface
  • Nmon Performance application provides different layers of advanced features as exposed above

NMON BASELINE – Detect anomalies and consumption derivation

The “NMON Baseline” is a powerful and simple feature implemented in the application to provide an advanced analyses of resources utilisation anomaly detection.

The Nmon application uses multiple KVstore that are generated and/or updated on a weekly basis. (every Sundays by default)

The associated reports will calculate the lower, average and upper usual metric value of a given performance monitor. (perc05, avg, perc95)

The calculation is made on a per day of week basis (Monday, Tuesday…), by time slices of 5 minutes, by host and over last 3 months of data.

Baseline features are available for the following performance metrics:

  • CPU usage
  • Real and Virtual memory utilisation
  • Number of IOPS (disks I/O per seconds)
  • IBM Power frames specific metrics (LPAR usage. Pool usage)

This provides the usual usage of that given performance monitor on a per day of week basis, the data are exploited in the “NMON Baseline” interfaces:

The baseline interfaces will manage and expose the current metrics usage over the baseline values.

A baseline represents for instance the usual usage of the CPU utilisation over the last 3 month of data for the current day of the week.

The upper and lower value use the “predict” command design, the current usage is exposed against the baseline, the upper and lower values as well as charted in the future:

From this advanced analysis, we can determine if the current usage of the main metric is coherent with usual level of utilisation for that day of the week and the given time.

As the baseline interface is even able to chart in the future, we can easily observe this along the day and detect any anomalies.

The baseline interface also provides an alternative restitution without the upper and lower baselines:

Example of the CPU usage baseline of a Linux Production server:

Physical real memory usage:

Baseline features are  simple to use by using the baseline macros, anyone can use the macros to create their own dashboards:

Additional notes about the baseline calculation generation:

  • The generation of the baseline data can be customized to filter out specific periods, such as bank holidays, data center recovery exercises…
  • This customization is to be made by Splunk admins with the expertise of system experts and data center managers
  • This can easily be achieved by mapping the raw data with additional lookups, and filtering out the unwanted periods or values

NMON predictive interface – a front end to predict command and algorithms

The Nmon Performance application provides an interface “NMON Predictive” which acts as a front end to the Splunk predict command:

Using the predictive interface, users can estimate the future usage of key metric performance.

The interface is dynamic and will list available choices depending on the type of Operating System:

As well it implements main algorithms available with the predict commands, and allows tuning different features of the predict command:

Example of result: estimating future CPU usage best on past data:


NMON comparaison interface – compare periods and systems performance

The Nmon Performance application provides an interface dedicated to achieve comparison between periods of time:

This interface allows:

  • Selecting 2 different time range periods
  • Selecting in a given list of system key metrics
  • Selecting hosts and other various parameters (time filtering, span…

NMON dashboards – additional dashboards

Bullets dashboards:

The bullet dashboard exposes TOP servers by CPU and memory, as well as TOP processes by CPU:

With advanced drilldown features:

TOP hosts bubble chart dashboard:

This dashboard exposes resources TOP usage hosts by CPU / Memo:

TOP processes bubble chart dashboard (by CPU / Memory):

This dashboard exposes main processes consuming CPU and memory in selected systems:

Solaris zones dashboard:

This is a Solaris specific dashboard that exposes main virtual zones statistics: (Solaris Workload Manager)


NMON data dictionary

The Nmon Performance embeds a dictionary of every piece of data generated by the application, and available in it.

This data dictionary is being stored in a file based lookup, and exploited in the data dictionary interface:

The data dictionary exposes the definition of metrics, as well as the associated SPL searches, context, etc…


NMON add-on reporting dashboard

The Nmon Performance application embeds reporting about the deployment of nmon technical ad-dons:


The dashboard provides various deployment related items:

  • The TA deployed per Splunk instance including its type, version, last reporting date…
  • The deployment activity
  • The identification information of Splunk instances
  • Nmon processing interpreter versions (Python versus Perl


NMON integrated navigation bootstrap scheme

The application implements a bootstrap integrated navigation scheme which is available in most of interfaces of the application.

The bootstrap integrated navigation scheme provides instant and easy access to raw data access, additional information (such as definition of  performance metrics) and fast link access to associated interfaces.

Example of integrated navigation panels:

In the above example, clicking on the “Explore All” link opens a bootstrap window which explains the structure of the data available in the application, as well as direct link access to raw data.
These integrated navigation features are available as a standard in all interfaces, the following example is taken from the memory statistics interfaces for Linux systems:


NMON eventtypes data structure:

Every little search in the application will be based on a Splunk’s eventtype definition.

This very simple feature allows getting the data easily accessible in Splunk,  organised, and easily adaptable to your needs.

A simple search in Splunk will show its structure:


NMON HOWTOs interfaces: exploiting nmon data with semi-interactive UI and real SPL examples

The Nmon Performance application provides several “HOWTOs” interfaces that help users building their own queries against Nmon data in a semi interactive way:

Most Splunk applications provide complex manipulation of the data they produce or exploit, unfortunately very few (if not none) will provide the minimal level of information to allow users exploiting the application’s data.

This is with this idea in mind that HOWTOs interfaces have been developed, these interfaces provide consistent and various examples of Nmon data manipulation with the Splunk Search Processing language.


NMON accelerated data models: best performances for best users experience

The Nmon application massively implements Splunk data models which provides great features for data manipulation and performances with data model acceleration.

While regular Splunk search against the Nmon data will perform with the highest level of performance possible, data model acceleration provide the supplementary layer of optimization that allows the application to deliver the best performance possible.

NMON data models are available directly in the data model menu and sub-menus:



In this article, we have provided a detailed analysis of both solution features and behaviours.

Can we fully compare these two solutions ?

The answer is simple, yes, and no.

While both application will manage performance and inventory data of main Unix and Linux systems, these two solution don’t really share the same goal.

Splunk app for *nix intends to collect and ingest in Splunk various aspects of *nix system, including performance metrics and inventory data.
It also intends to collect security related events, system logs, configuration files changes…

On the opposite, Nmon Performance monitor application for Splunk has been developed with the Unix philosophy, do one thing and do it well.

Nmon Performance only focusses on ingesting performance metrics and inventory data with the goal to provide the highest level of system performance analysis, with rich data and rich interfaces.

Are both solution mutually exclusive ?

The answer is no, it is totally possible to deploy both applications, and choose what will be activated in the data generation.

The Splunk for *nix addon can generate data for items that are out of the scope of Nmon Performance application, it is possible to deploy the Splunk for *nix addon and activate items for which Nmon Performance will never report. (such as security related items)


Leave a Reply

Your email address will not be published. Required fields are marked *