Splunk Security Schooling With Static Datasets For Budding Blue Teamers

Cian Heasley
5 min readJan 21, 2020

--

If you find the Splunk core training a little light on security specific data to play with, or need sample datasets to test potential detections and hone your hunting abilities with, then this list is for you.

In making this list I’ve done my best to prioritize small file sizes where possible and ensure that everything is available to download for free.

While this article revolves around Splunk there is absolutely no reason why most of these datasets could not be of equal educational value to people using the ELK Stack.

For all of these repos (other than BotsV2 below which creates its own index) you’ll want to create an index to keep the samples separate from each other and more easily calibrate your searches.

For each dataset I’m providing a very basic example SPL search mapped to detection of specific MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) techniques to give a feel for the usefulness of each one.

A lot of these collections are somewhat Windows-centric, if you are aware of good, sizeable sample data from Linux or other sources please hit me up on Twitter.

BotsV2

The first resource on this list has to be BotsV2, produced by Splunk itself. There are two versions of this dataset, one which is trimmed down and prioritized around adversary activity at 3.2GB and a larger, noisier version that is 16.4GB.

Ingested data licensing is a consistent concern for anyone using Splunk, especially for research or dev work, because this data is distributed pre-indexed, there will be no volume-based licensing limits to worry about.

Choose one of the sets, get it into Splunk and start running searches against it, I consider it an immense learning asset.

As an example we can look above at a simple search to show Windows EventCodes 4720 (user creation) or 4732 (user added to security-enabled local group) to try to find suspicious activity by potential adversaries (ATT&CK T1136).

Mordor

Mordor is a collection of JSON formatted, pre-recorded adversary simulation events that is categorized according to the MITRE ATT&CK Framework. You should go now and grab the Github repo, contained within there are Windows datasets organized by ATT&CK tactic, and data related to simulation of APT3 techniques.

Mordor Logo

I’d suggest directory monitoring with *.tar.gz whitelisted when adding this dataset, to avoid ingesting README files, Splunk will automatically decompress the tars for you.

The example above shows a really simple detection for Mimikatz (ATT&CK S0002) based loosely on a Sigma rule, it looks for access to LSASS associated with the usage of Mimikatz.

EVTX-ATTACK-SAMPLES

EVTX-ATTACK-SAMPLES is a collection of Windows events that are once again categorized very helpfully by ATT&CK tactic and technique.

Getting this data into Splunk is slightly more complex (due to Splunk file format processing constraints) than the other datasets in this article but really very worth it. You’ll want to grab the repo and also head over to get evtx2json, which we will be using to feed the data into Splunk.

Using evtx2json you can pass the content of each folder directly to your Splunk indexer or search head using the HEC (http event collector), your cli command will look something like this:

python evtx2json.py --splunk --host your.splunk.server.address --port 8088 --token {your-HEC-token-here} process_files -f EVTX-ATTACK-SAMPLES/CommandandControl/*.evtx

As with the others you’ll want to have set up an index specifically for this data beforehand and make sure your HEC token is associated with that index.

Above we can see an example Splunk search against this data, in this case looking for Powershell (ATT&CK Technique T1086) commands with flags which could indicate base64 encoding as a method of obfuscation.

BRAWL

MITRE has implemented their own adversary emulation system in the form of CALDERA, this dataset is the result of log collection from BRAWL — a CALDERA based scenario run in the cloud. Log sources included in this repo are Windows Event Logs and Sysmon in a single zip file.

As this dataset is provided by MITRE themselves the scenario was planned around, and the data itself is classified and grouped by, ATT&CK tactics and techniques which is very helpful for Blue Teamers working within this framework testing out SPL.

Back to Powershell above, this time looking for the “Bypass” Execution Policy Flag which can be used to circumvent local PowerShell Execution Policy restrictions.

In Conclusion

For Blue Teamers there are some great, open source, community driven projects and resources out there for training and testing purposes, it is just a matter of finding them.

I feel like we aren’t always as good at information sharing as our Red Team compatriots, I hope as an industry we can change that. I’d encourage everyone reading this article to support and contribute to endeavours such as Roberto Rodriguez’s Mordor and Florian Roth’s SIGMA rulesets.

--

--

Cian Heasley

I work in infosec and live in Scotland, I am fascinated by computer security, privacy and the intersection of the internet, technology and human rights.