Querying data in Splunk for automation workflows in Phantom

Summary

This article deals with querying Splunk from within Phantom to enable automation of security use-cases. Often it is required to act upon data within Splunk, or to augment case details in Phantom by querying Splunk for additional information.

Use-Cases

There are a number of use-cases or playbooks with Phantom that requires information within Splunk to be queried. These could include functionalities such as:

  • Querying the proxy logs to find other users that clicked on a malicious link. These users could undergo an automatic password reset; or a full malware scan of their machine can be initiated
  • Querying the email logs to determine which other users were the target of a Phishing campaign
  • Querying firewall logs to determine if a call-out was made to a C&C server. This could indicate compromise on a client machine that needs to be re-mediated

For the example in this article, we will look at finding users in the organisation that clicked on a malicious link in a Phishing email

Security Considerations

There are a number of security considerations we need to keep in mind when setting up the integration between Splunk and Phantom as well as when executing security playbooks.

Valid TLS certificates on Splunk Web and Phantom

It is recommended that a valid TLS certificate is installed on the Search Head or cluster that is to be queried by Phantom. If the authenticity of the data cannot be guaranteed, an attacker could direct the query to rogue infrastructure, returning malicious search results that will be operated on automatically.

This certificate also needs to be validated by the Phantom client, so any required CA certs also need to be present on the certificate store of the Phantom machine. To install these certificates on the Phantom instance, do the following (as root):

Copy your CA public key (PEM format) to the system PKI store:

cp  /home/myuser/my_org_root_ca.pem /etc/pki/ca-trust/source/anchors

Update the system CA trust store

update-ca-trust

OpenSSL routines in Phantom will now be able to verify the certificate on Splunk.

Setting up Splunk

On Splunk, we require a “service account” that Phantom can use to authenticate with. This account will execute REST commands against the Search Head, to initiate searches and retrieve results. An internal Splunk account will suffice for this purpose. It is recommended that each Splunk asset (endpoint) within Phantom be setup with a separate service account. This can also be used to split queries to Splunk between the security and operations teams.


Create Phantom Integration user in Splunk:

Setting up Phantom

It is recommended that the Enterprise Security Search Head (or cluster if existing) be setup as asset in Phantom. Add a new asset as with the following example:

Click Test Connectivity to perform a basic communications test and certificate check between Phantom and Splunk.

In the Test Connectivity screen above it can be seen that the ES server version is detected.

Steps for building a Splunk query

Formatting a Splunk query

Before the query can be sent to Splunk, we need to do some massaging on the query string to achieve a succesfull search execution. We need to ensure a few things to ensure the query string is robust and of a good quality by following these guidelines:

  1. Use tstats (with summariesonly=true) wherever possible. For the use-cases mentioned above, you will will be looking for indicators in massive volumes of data, so querying accelerated datamodels is preferred. Also use the CIM fields to ensure your SPL queries are reusable at other customers
  2. Rename CIM fields in SPL. There is a clash between the string definition of the datastructures in Phantom and Splunk field names containing a period, especially when returning from a datamodel. Splunk uses the Datamodel.cimfield notation, with the period (.) between the two being the problem for Phantom. Make sure to rename these fields in the SPL query. Example: rename Web.user as web_user
  3. Ensure ALL the fields that you need to be displayed OR captured as evidence in Phantom is part of the query AND that they are returned with the | fields command at the end of the Splunk search. This will also be discussed below as this has relevance when executing the query and parsing the results.
  4. Ensure the fields of interest are GROUPed by to ensure unique values. For instance if you want a list of users that clicked a bad URL, even if the user clicked it 5 times, you are only interested in the username (once). If we want to reset the password for that user in automation we want to set it once, and definitely not 5 times.
  5. Reduce the data as much as possible. For example look only for allowed or successful events when querying the victims of a Phishing campaign. All of this evidence is written to the PostgresSQL database on Phantom and we do not want to contaminate it with superfluous information.
  6. Ensure the format of the data you are querying MATCHES the data in Splunk. Example is requestURL artifacts in Phantom. These artifacts have the following string format: https://some.url.com/somepath/somefile. The data that is in Splunk might not be in the correct format depending on the log source used. For instance, Zscaler does not record the (https://) part of the URL in its logs. If you search for these strings in the datamodel, zero results would be found unless the (https://) component is stripped from the requestURL artifact before the query is constructed

Template

Using the above guidelines, the query can be constructed using a format code block. Note that the query does not include the tstats command. The search operator (tstats in our example) is selected when you execute the run query action on a Splunk asset within Phantom. The rest of the query is passed in as a parameter to run query as below:

summariesonly=t count as evt_count, values(Web.url) as web_url, values(Web.status) as web_status, values(Web.http_method) as web_http_method, values(Web.src) as web_src from datamodel=Web where (Web.url="{0}") AND Web.action="allowed" earliest=-7d@d by Web.user
 | rename Web.user as web_user
 | fields web_user, web_url, web_http_method, web_status, web_src, evt_count

Template Parameters

For the parameters, we feed in the requestURLs of all artifacts within the container. This is done by a filter block. The Template Parameters field needs to point to the artifact list of the filtered (in scope) artifacts.

filtered-data:filter_URLs:condition_1:artifact:*.cef.requestURL

Example

The problem with the above query is that it would not support a Splunk search for multiple requestURL artifacts in the container. The reason for this is that multiple requestURLs, eg: https://www.google.com and https://www.microsoft.com will be returned as a list, separated by commas. This is standard behavior of Phantom when dealing with lists of artifacts.

This will result in a query being created, separated by commas, such as:

… from datamodel=Web where (Web.url="https://www.google.com, https://www.microsoft.com") AND Web.action="allowed"…

Where the intention was to separate entries with a logical OR in SPL, such as:

…  from datamodel=Web where (Web.url="http://www.google.com" OR Web.url="http://www.microsoft.com") AND Web.action="allowed" …

This can only be achieved by adding the following custom code to the Format block:

Replace the code:

phantom.format(container=container, template=template, parameters=parameters, name="Format_Splunk_Query_for_campaign_victims", separator=separator)

With the following:

### custom code begin
separator = "*\" OR Web.url=\"*"
phantom.format(container=container, template=template, parameters=parameters, name="Format_Splunk_Query_for_campaign_victims", separator=separator)
### custom code end

Executing the Query

Add a new Action Block following the Format block defined above. This Action should reference the Splunk App created above. Select this app and then the run query action.

The following details were used:

The display field MUST match the | fields list returned by the Splunk query. This will ensure all field information returned by the query is available within Phantom.

You can pick your command from the command dropdown, which in our case is tstats.

Customization

Unfortunately there is a small data quality problem when searching for requestURL artifacts stored in Phantom in Enterprise Security.

This is due to the DataModels not enforcing strict formats on data types. The problem is with Web.url in the Web Data Model. Depending on the technology used, this could include the text of the URI, for example:

http://
https://
ftp://

For instance the Zscaler logs do not include the URI as noted above in the source logs (or the resulting Data Model), but the requestURL in Phantom might. For this reason it is proposed to add a small bit of custom code to remove the URI from the text that will be parsed (in the form of a query) from Phantom to ES.

The intention is to remove the URI with the regex \w+:// by adding custom code to the run query codeblock as follows. Replace:

parameters = []

# build parameters list for 'Splunk_find_campaign_victims' call

With:

parameters = []

### custom code begin
formatted_data_1 = re.sub(r"\w+://", "", formatted_data_1)
### custom code end

# build parameters list for 'Splunk_find_campaign_victims' call

To ensure URIs will be removed from the query string sent to Splunk.

Parsing Query Results

Once the query executes successfully, you should see the output in the Investigation view under the timeline of the case.

Viewing Results in the Investigation dashboard

I’m using EventGen data for BlueCoat proxies for my development and this is what my results look like:

Viewing Data Structure returned by Splunk

Scrolling further down to the Object Parameters, the data structure that was returned by the query can be inspected:

This data structure is most interesting as it provides guidance into how we need to interrogate the action results that will be returned by the query.

Adding Search Results to a case comment

We can then use the action results in our Playbook. For the purpose of demonstration, if we wish to create a case comment with the list of users that were also affected by the campaign, ie: a list of users that clicked a malicious URL, we can do so as follows:

Create a new Format Codeblock with Template:

%%
 User affected by campaign {0}
 %%

Template Parameters

 Splunk_find_campaign_victims:action_result.data.*.web_user

The highlighted part above denotes the Splunk field name that will be used ( should match a value of | fields in the SPL query) when the returned collection is iterated over.

This will allow an analyst to quickly see how many (and which) other users were affected by the Phishing campaign. In my development instance these users are then assigned random passwords for a password reset.

Querying data in Splunk for automation workflows in Phantom

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top