{"id":10,"date":"2017-05-10T11:55:04","date_gmt":"2017-05-10T10:55:04","guid":{"rendered":"https:\/\/www.octamis.com\/octamis-blog\/?p=10"},"modified":"2017-05-14T12:50:49","modified_gmt":"2017-05-14T11:50:49","slug":"windows-performance-monitoring-tips-with-splunk","status":"publish","type":"post","link":"https:\/\/www.octamis.com\/octamis-blog\/windows-performance-monitoring-tips-with-splunk\/","title":{"rendered":"WINDOWS PERFORMANCE MONITORING TIPS WITH SPLUNK"},"content":{"rendered":"<p>At Octamis we love Splunk, and we love to share our knowledge and experience, so let&#8217;s study some tips on\u00a0Windows monitoring with Splunk !<\/p>\n<h2><span style=\"color: #339966;\">PREPARING YOUR SPLUNK<\/span><\/h2>\n<p><strong>Let&#8217;s proceed in the order, we want first to get Splunk ready to receive Windows performance data.<\/strong><\/p>\n<p>This is quite simple and\u00a0relies on deploying the Windows technical add-on built by Splunk:<\/p>\n<p><a href=\"https:\/\/splunkbase.splunk.com\/app\/742\/\">https:\/\/splunkbase.splunk.com\/app\/742\/<\/a><\/p>\n<p><b>Depending on your Splunk architecture, ensure to deploy the technical add-on everywhere it is required:<br \/>\n<\/b><\/p>\n<ul>\n<li>Indexers (clustered or standalone)<\/li>\n<li>Search heads<\/li>\n<li>Intermediate forwarder on the path to your indexers, if any<\/li>\n<\/ul>\n<p>The add-on\u00a0has no inputs activated by default, at this point there are no modification required.<\/p>\n<p><strong>Indexes creation:<\/strong><\/p>\n<p>The Windows technical add-on contains the embedded definition for a few indexes:<\/p>\n<ul>\n<li>perform: dedicated index for Windows monitoring data using the perfmon<\/li>\n<li>windows: dedicated index for various log data and monitoring not related to perfmon data<\/li>\n<li>wineventlog: security and log data<\/li>\n<\/ul>\n<p>If you are running a standalone server then you have nothing else to do as the indexes have created for you by the add-on.<\/p>\n<p>If \u00a0you are running Splunk with clustered indexers, be sure to declare those indexes properly before continuing the setup.<\/p>\n<h2><span style=\"color: #339966;\">DEPLOYING AND CONFIGURING THE ADD-ON<\/span><\/h2>\n<p><strong>For the demonstration purpose, we\u00a0will assume that:<\/strong><\/p>\n<ul>\n<li>You already have servers running with the Splunk Universal Forwarder<\/li>\n<li>The servers are connected to your Splunk indexer(s) and properly configured for Splunk indexing<\/li>\n<li>The servers are connected to a Splunk deployment server (recommended) or you use your deployment solution<\/li>\n<\/ul>\n<p><strong>Deploy the technical add-on as usual and continue the setup.<\/strong><\/p>\n<h2><span style=\"color: #339966;\">COLLECTING PERFORMANCE DATA<\/span><\/h2>\n<p>Let&#8217;s have a look at the default &#8220;inputs.conf&#8221; file provided within the technical add-on, since we focus\u00a0on performance metric, we are only interested for now in the &#8220;perfmon&#8221; stanzas.<\/p>\n<p><strong>For the the demonstration purposes, let&#8217;s have a look at CPU and memory metrics:<\/strong><\/p>\n<p><em>Splunk_TA_windows\/default\/inputs.conf<\/em><\/p>\n<pre>## CPU\r\n[perfmon:\/\/CPU]\r\ncounters = % Processor Time; % User Time; % Privileged Time; Interrupts\/sec; % DPC Time; % Interrupt Time; DPCs Queued\/sec; DPC Rate; % Idle Time; % C1 Time; % C2 Time; % C3 Time; C1 Transitions\/sec; C2 Transitions\/sec; C3 Transitions\/sec\r\ndisabled = 1\r\ninstances = *\r\ninterval = 10\r\nobject = Processor\r\nuseEnglishOnly=true\r\nindex = perfmon\r\n\r\n## Memory\r\n[perfmon:\/\/Memory]\r\ncounters = Page Faults\/sec; Available Bytes; Committed Bytes; Commit Limit; Write Copies\/sec; Transition Faults\/sec; Cache Faults\/sec; Demand Zero Faults\/sec; Pages\/sec; Pages Input\/sec; Page Reads\/sec; Pages Output\/sec; Pool Paged Bytes; Pool Nonpaged Bytes; Page Writes\/sec; Pool Paged Allocs; Pool Nonpaged Allocs; Free System Page Table Entries; Cache Bytes; Cache Bytes Peak; Pool Paged Resident Bytes; System Code Total Bytes; System Code Resident Bytes; System Driver Total Bytes; System Driver Resident Bytes; System Cache Resident Bytes; % Committed Bytes In Use; Available KBytes; Available MBytes; Transition Pages RePurposed\/sec; Free &amp; Zero Page List Bytes; Modified Page List Bytes; Standby Cache Reserve Bytes; Standby Cache Normal Priority Bytes; Standby Cache Core Bytes; Long-Term Average Standby Cache Lifetime (s)\r\ndisabled = 1\r\ninterval = 10\r\nobject = Memory\r\nuseEnglishOnly=true\r\nindex = perfmon\r\n<\/pre>\n<p><strong>Things you will (should) probably want to customise:<\/strong><\/p>\n<ul>\n<li>&#8220;interval&#8221; :\u00a0this is the time in seconds between 2 performance collections, and will influence the volume of data to be generated. 10 seconds is probably quite high, 30 or 60 seconds are good values that save license, bandwidth and CPU footprint on the servers<\/li>\n<li>&#8220;mode = multikv&#8221; :\u00a0this is a great option introduced years ago (see: <a href=\"https:\/\/www.splunk.com\/blog\/2013\/10\/28\/new-features-for-perfmon-in-splunk-6\">https:\/\/www.splunk.com\/blog\/2013\/10\/28\/new-features-for-perfmon-in-splunk-6<\/a>), this is smart, it saves license, storage and bandwidth<\/li>\n<li>&#8220;disabled = 1&#8221;: This deactivates the input which is the case by default but you need to explicitly activate each input<\/li>\n<\/ul>\n<p><strong>Let&#8217;s\u00a0with the following configuration, as always do never modify a default file, create a local file and copy only the stanzas you are interested in:<\/strong><\/p>\n<p><em>Splunk_TA_windows\/local\/inputs.conf<\/em><\/p>\n<pre>## CPU\r\n[perfmon:\/\/CPU]\r\ncounters = % Processor Time; % User Time; % Privileged Time; Interrupts\/sec; % DPC Time; % Interrupt Time; DPCs Queued\/sec; DPC Rate; % Idle Time; % C1 Time; % C2 Time; % C3 Time; C1 Transitions\/sec; C2 Transitions\/sec; C3 Transitions\/sec\r\ndisabled = 0\r\ninstances = *\r\ninterval = 30\r\nobject = Processor\r\nuseEnglishOnly=true\r\nindex = perfmon\r\nmode = multikv\r\n\r\n## Memory\r\n[perfmon:\/\/Memory]\r\ncounters = Page Faults\/sec; Available Bytes; Committed Bytes; Commit Limit; Write Copies\/sec; Transition Faults\/sec; Cache Faults\/sec; Demand Zero Faults\/sec; Pages\/sec; Pages Input\/sec; Page Reads\/sec; Pages Output\/sec; Pool Paged Bytes; Pool Nonpaged Bytes; Page Writes\/sec; Pool Paged Allocs; Pool Nonpaged Allocs; Free System Page Table Entries; Cache Bytes; Cache Bytes Peak; Pool Paged Resident Bytes; System Code Total Bytes; System Code Resident Bytes; System Driver Total Bytes; System Driver Resident Bytes; System Cache Resident Bytes; % Committed Bytes In Use; Available KBytes; Available MBytes; Transition Pages RePurposed\/sec; Free &amp; Zero Page List Bytes; Modified Page List Bytes; Standby Cache Reserve Bytes; Standby Cache Normal Priority Bytes; Standby Cache Core Bytes; Long-Term Average Standby Cache Lifetime (s)\r\ndisabled = 0\r\ninterval = 30\r\nobject = Memory\r\nuseEnglishOnly=true\r\nindex = perfmon\r\nmode = multikv\r\n<\/pre>\n<p><strong>Deploy this configuration to your Windows servers, and if you use Splunk deployment server, ensure you check &#8220;restart splunkd&#8221;.<\/strong><\/p>\n<h2><span style=\"color: #339966;\">CHECKING DATA COMING IN<\/span><\/h2>\n<p><strong>Next step, let&#8217;s check for some data coming in:<\/strong><\/p>\n<pre>index=perfmon<\/pre>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-17\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1.png\" alt=\"\" width=\"1901\" height=\"803\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1.png 1901w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1-300x127.png 300w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1-768x324.png 768w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img1-1024x433.png 1024w\" sizes=\"(max-width: 1901px) 100vw, 1901px\" \/><\/a><\/p>\n<p><strong>Depending on the mode (multikv or not), the data will be available in:<\/strong><\/p>\n<p><em>CPU statistics:<\/em><\/p>\n<ul>\n<li>standard mode: index=perfmon sourcetype=&#8221;Perfmon:cpu&#8221;<\/li>\n<li>multikv mode: index=perfmon sourcetype=&#8221;PerfmonMk:cpu&#8221;<\/li>\n<\/ul>\n<p><em>Memory statistics:<\/em><\/p>\n<ul>\n<li>index=perfmon sourcetype=&#8221;Perfmon:Memory&#8221;<\/li>\n<li>index=perfmon sourcetype=&#8221;PerfmonMk:Memory&#8221;<\/li>\n<\/ul>\n<p><strong>In this article, we will go will the multikv mode.<\/strong><\/p>\n<h2><span style=\"color: #339966;\">ANALYSING CPU STATISTICS<\/span><\/h2>\n<p><strong>Let&#8217;s get some CPU statistics:<\/strong><\/p>\n<p><em>Per host average CPU usage over time:\u00a0<\/em><\/p>\n<pre>index=perfmon sourcetype=\"PerfmonMk:CPU\" instance=_Total\r\n| timechart avg(%_Processor_Time) as cpu_usage by host<\/pre>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-18\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2.png\" alt=\"\" width=\"1900\" height=\"660\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2.png 1900w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2-300x104.png 300w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2-768x267.png 768w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img2-1024x356.png 1024w\" sizes=\"(max-width: 1900px) 100vw, 1900px\" \/><\/a><\/p>\n<p>Very simple.<\/p>\n<h2><span style=\"color: #339966;\">WHERE IS MY PERCENTAGE OF MEMORY UTILISATION ?<\/span><\/h2>\n<p>If you are &#8220;like me&#8221;, when looking at memory statistics, the first (and potentially the only) metric you want to be able to retrieve is the percentage of memory being used, or eventually memory free.<\/p>\n<p>So what&#8217;s the problem then ? Well, &#8220;as it&#8221; although we have dozens of various metrics, the percentage of utilisation\u00a0is not available with perfmon data.<\/p>\n<p><strong>What ???<\/strong><\/p>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-20\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3-212x300.jpg\" alt=\"\" width=\"212\" height=\"300\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3-212x300.jpg 212w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3.jpg 600w\" sizes=\"(max-width: 212px) 100vw, 212px\" \/><\/a><\/p>\n<p>Hopefully, we can calculate it ! Using Splunk power and features, we can correlate between the inventory data which contains the amount of physical memory available, and the memory metrics available in\u00a0perfmon.<\/p>\n<p><strong>The following search reports the amount of physical memory in KB:<\/strong><\/p>\n<div>\n<pre>index=windows sourcetype=WinHostMon\r\n| stats latest(TotalPhysicalMemoryKB) as TotalPhysicalMemoryKB, latest(TotalVirtualMemoryKB) as TotalVirtualMemoryKB by host | sort 0 host\r\n<\/pre>\n<\/div>\n<p><strong>Notes:<\/strong><\/p>\n<p>This requires the input &#8220;OperatingSystem&#8221; to be activated in your deployment, using:<\/p>\n<pre>[WinHostMon:\/\/OperatingSystem]\r\ninterval = 600\r\ndisabled = 1\r\ntype = OperatingSystem\r\nindex = windows\r\n<\/pre>\n<p><strong>For the demonstration, let&#8217;s store this result in\u00a0a temporarily lookup csv file:<\/strong><\/p>\n<div>\n<pre>index=windows sourcetype=WinHostMon\r\n| stats latest(TotalPhysicalMemoryKB) as TotalPhysicalMemoryKB, latest(TotalVirtualMemoryKB) as TotalVirtualMemoryKB by host | sort 0 host\r\n| outputlookup windows_memory_inventory.csv<\/pre>\n<p><strong>Then, looking at the memory statistics, we have the amount of currently used volume of memory in KB, let&#8217;s map this with the inventory data and use some easy calculation:<\/strong><\/p>\n<\/div>\n<pre>index=perfmon sourcetype=\"PerfmonMk:Memory\"\r\n| eval used_memory_KB=coalesce('Available_KBytes', Value)\r\n| lookup windows_memory_inventory.csv host as host OUTPUTNEW TotalPhysicalMemoryKB\r\n| eval free_memory_pct=((used_memory_KB\/TotalPhysicalMemoryKB)*100), used_memory_pct=(100-free_memory_pct)\r\n| timechart avg(used_memory_pct) as used_memory_pct by host\r\n<\/pre>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-22\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3.png\" alt=\"\" width=\"1900\" height=\"582\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3.png 1900w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3-300x92.png 300w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3-768x235.png 768w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img3-1024x314.png 1024w\" sizes=\"(max-width: 1900px) 100vw, 1900px\" \/><\/a><\/p>\n<p>There you go!<\/p>\n<p><strong>Resilient solution:<\/strong><\/p>\n<ul>\n<li>create a KVstore based lookup table to store our Windows configuration inventory data<\/li>\n<li>schedule a report to update the lookup table on a regular basis (per day basis for example)<\/li>\n<li>create an auto lookup configuration such that it is not necessary to perform the lookup command manually<\/li>\n<\/ul>\n<h2><span style=\"color: #339966;\">WHAT ABOUT PROCESSES ?<\/span><\/h2>\n<p>Understanding a system CPU load requires knowing\u00a0what and when the processes consumes resources, the perfmon provides processes related data with the &#8220;[perfmon:\/\/Process]&#8221; stanza.<\/p>\n<p><strong>However, for some reasons the perfmon data is not accurate on multi core systems, a nice article gave me the answer I was looking for:<\/strong><\/p>\n<p><a href=\"https:\/\/robertlabrie.wordpress.com\/2016\/01\/06\/windows-cpu-monitoring-with-splunk\/\">Windows CPU monitoring with&nbsp;Splunk<\/a><\/p>\n<p><strong>Based on this great article, let&#8217;s add our WMI input to generate accurate processes CPU statistics: (caution: this is a &#8220;wmi.conf&#8221; and not &#8220;inputs.conf&#8221;)<\/strong><\/p>\n<p><em>Splunk_TA_windows\/local\/wmi.conf<\/em><\/p>\n<pre>[WMI:process]\r\nindex = windows\r\ndisabled = 0\r\ninterval = 30\r\nwql = Select IDProcess,Name,PercentProcessorTime,TimeStamp_Sys100NS from Win32_PerfRawData_PerfProc_Process\r\n<\/pre>\n<p><strong>Once deployed, let&#8217;s use some magic searches and start analysing processes activity:<\/strong><\/p>\n<pre>index=windows sourcetype=\"WMI:process\" Name!=_Total Name!=Idle\r\n| reverse | streamstats current=f last(PercentProcessorTime) as last_PercentProcessorTime last(Timestamp_Sys100NS) as last_Timestamp_Sys100NS by Name\r\n| eval cputime = 100 * (PercentProcessorTime - last_PercentProcessorTime) \/ (Timestamp_Sys100NS - last_Timestamp_Sys100NS)\r\n| search cputime &gt; 0\r\n| timechart limit=50 useother=f avg(cputime) by Name\r\n<\/pre>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-21\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4.png\" alt=\"\" width=\"1899\" height=\"579\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4.png 1899w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4-300x91.png 300w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4-768x234.png 768w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img4-1024x312.png 1024w\" sizes=\"(max-width: 1899px) 100vw, 1899px\" \/><\/a><\/p>\n<p><strong>Since Windows will create a new process for a given program able to run in multi core mode, we can improve this search and aggregate a per command invocation basis:<\/strong><\/p>\n<pre>index=windows sourcetype=\"WMI:process\" Name!=_Total Name!=Idle\r\n| reverse | streamstats current=f last(PercentProcessorTime) as last_PercentProcessorTime last(Timestamp_Sys100NS) as last_Timestamp_Sys100NS by Name\r\n| eval cputime = 100 * (PercentProcessorTime - last_PercentProcessorTime) \/ (Timestamp_Sys100NS - last_Timestamp_Sys100NS)\r\n| search cputime &gt; 0\r\n| stats avg(cputime) as cputime by _time,host,Name\r\n| rex field=Name \"(?<command><\/command>[^#]*)#{0,}\"\r\n| stats sum(cputime) as cputime by _time,host,Command\r\n| timechart limit=50 useother=f avg(cputime) as cputime by Command\r\n<\/pre>\n<p><a href=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-23\" src=\"https:\/\/51.68.196.81\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5.png\" alt=\"\" width=\"1901\" height=\"642\" srcset=\"https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5.png 1901w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5-300x101.png 300w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5-768x259.png 768w, https:\/\/www.octamis.com\/octamis-blog\/wp-content\/uploads\/2017\/05\/img5-1024x346.png 1024w\" sizes=\"(max-width: 1901px) 100vw, 1901px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Et voila !<\/p>\n<p><strong>You now have all the main pieces of work to start analysing Windows performance with accuracy, enjoy.<\/strong><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At Octamis we love Splunk, and we love to share our knowledge and experience, so let&#8217;s study some tips on\u00a0Windows monitoring with Splunk ! PREPARING YOUR SPLUNK Let&#8217;s proceed in the order, we want first to get Splunk ready to receive Windows performance data. This is quite simple and\u00a0relies on deploying the Windows technical add-on [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,2,4],"tags":[13,11,10,18,21],"_links":{"self":[{"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/posts\/10"}],"collection":[{"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/comments?post=10"}],"version-history":[{"count":6,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/posts\/10\/revisions"}],"predecessor-version":[{"id":24,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/posts\/10\/revisions\/24"}],"wp:attachment":[{"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/media?parent=10"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/categories?post=10"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.octamis.com\/octamis-blog\/wp-json\/wp\/v2\/tags?post=10"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}