Analyzing Android Stalkerware With The Aid of Splunk (Part One)

Cian Heasley
8 min readDec 16, 2019

This article is potentially of use to anyone doing mobile app analysis, mobile malware research, bug bounty programs involving mobile apps or research into the Android operating system itself.

Getting from Point A to Point B

I’m volunteering time to work on a few different projects that revolve around researching, documenting and detecting stalkerware, mobile malware-like apps that are marketed to and used by abusers to spy on those close to them.

Android stalkerware uses various permissions in order to spy

As this is not my full time job I gradually came to the realization that I needed to become better organized and maximise my productivity with the limited time I have to work on these projects.

At my day job I work with Splunk, I started wondering how well I could integrate this platform into my volunteer project workflow, analyzing and identifying stalkerware apps and indicators of device compromise.

I should add here that although this piece involves the use of Splunk, I am sure someone could look into similar methodologies involving Elasticsearch or similar platforms, for that matter grep works wonders too.

Splunk requires data to ingest in order to produce worthwhile output so we must start by thinking about what we want as an end product and what tools will give us the necessary data that will enable us to produce that end product.

Which tools for the job?

In this article we’ll cover static analysis of Android apk files, there are two tools that I would consider using here, Super Android Analyzer and MARA Framework. Both SAA and MARA output to text which is exactly what we need, though SAA can output to JSON and MARA lacks key pair valued output so we are going to have go with SAA for simplicity’s sake.

SAA gives us a range of vulnerability detections that are categorized from critical through to warnings and also highlights areas in the code where it detects these vulnerabilities. The reports it provides include all of the basic information we would need to perform further analysis of an app.

So, first of all you will need to grab Super Android Analyzer from either its website or its github, another considerable advantage of SAA is that it works on MacOS, multiple Linux flavours and Windows with minimal fuss.

The time consuming part is going to be collecting the apks that you want to analyze, as I myself am dealing with apps that are outside of the Google Play Store this means going to each site that markets these apps and finding a download link. To keep detections fresh this process will need to be carried out every quarter or so. Luckily for the purposes of this article I have dozens of such apps already downloaded, so we can move on.

Once you have the app apks in a directory together you can set SAA to run through all apks in the directory and generate the JSON results for each app with a command similar to this:

cian@blackmirror:apkz$ super-analyzer -v —-test-all-json — downloads .

Then you just have to wait for the program to finish running, it’ll look something like this from the command line while it is working away.

Getting the data into Splunk

While Super Android Analyzer is doing its thing we can start to think about how we get the resulting output into Splunk. In my case the computer that I am running SAA on and the computer I am running my Splunk stand alone instance (no separate indexer) on are not the same computer, so I’m going to need something like a Splunk Universal Forwarder to get it all together.

Your setup may vary, based on my own setup that I’ll briefly document here you should be able to easily make any changes you need to. First off grab the Universal Forwarder from the Splunk website, you’ll need to register for an account if you don’t already have one. There are instructions on installing the UF here, in my case it was as simple as running this on my Debian Linux cli:

dpkg -i splunkforwarder-amd64.deb

From here we need to create an index on our Splunk search head, in my case I imaginitively titled it “android” and configure a receiving port for forwarded data, in my case I chose 31337. You’ll also need to create or edit a props.conf file, more general information on where this file lives and how to configure it is here.

My props.conf file for the SAA data looks like this:

[super-android]
INDEXED_EXTRACTIONS = json
TRUNCATE = 0
SHOULD_LINEMERGE = false
KV_MODE = json
NO_BINARY_CHECK = true

The important parts to pay attention to are that “TRUNCATE” must be set to 0 to prevent Splunk from cutting off important parts of the JSON we are sending to it due to length and “[super-android]” which defines our custom sourcetype, we will need to provide this in other configuration files.

That’s the search head end of things taken care of, now we need to look at the UF. in my case the file we need to edit, inputs.conf, lives in /opt/splunkforwarder/etc/system/local on my Linux box and we are going to add something like this to it:

[monitor:///home/cian/Android/apkz/results/…/*.json]
sourcetype=super-android
index=android
host_segment=6

In the first stanza we are telling the UF to monitor the directory in which SAA is storing its results, we are also telling the UF to recursively monitor (using the Splunk specific wildcard “…”) any .json files it finds under the main results directory. SAA names each subdirectory after app names so with host_segment we are telling it to use that portion of the path (the sixth element in the path) to derive the host meta-field in Splunk for ease of organization, we are also telling it to use the “super-android” sourcetype that we configured earlier above and similarly the “android” index.

In the same directory as inputs.conf is outputs.conf, this is where we tell the UF where to send the data that we defined in inputs.conf. Our outputs.conf will look something like this:

[tcpout]
defaultgroup = saa
[tcpout:saa]
server = 192.168.23.3:31337

This defines the output and where it should be sent, in this case my Splunk instance. More general information on outputs.conf can be found over here.

That’s about it, the first results should be visible on your search head by now.

Now we have the data, what next?

Now that we have the data flowing into the search head we can start to extract the output that we want.

Perhaps we want to start with something quite simple, a list of the apps, the app version number, the apk installer’s sha256 hash and the total number of potential vulnerabilities Super Android Analyzer identified:

One aspect of the research I am doing right now is to try to detect stalkerware infections by monitoring the connections a device makes over the network, we can immediately pinpoint “URL disclosure” as a Super Android Analyzer result that we are very interested in.

Above we can see a search that brings up, via a regex, all of the URLs found within each app’s source code by Super Android Analyzer, the SPL itself looks like this:

index=android host=* "warnings{}.code"=*
| fields app_package, app_fingerprint.sha256, warnings{}.code
| rex field=warnings{}.code "(?<uri>(https?|ftp)://[a-zA-Z0-9.\-_]+/[a-zA-Z0-9+&@#/%=~_\-|!:,.;]*)"
| mvexpand uri
| dedup uri
| table app_package, app_fingerprint.sha256, uri

We can then take the results of that search and move to finding a way to create detections around them, though that is outside the scope of this article.

If you are interested in bugbountys or app analysis in general you may want a list of which apps you’ve analyzed that are potentially vulnerable to man in the middle or similar attacks, once again we can immediately bring this information up with a simple Splunk search:

SPL for that looks like this:

index=android “criticals{}.name”=”Accepting all SSL certificates” OR “criticals{}.name”=”WebView ignores SSL errors” 
| table app_package, app_fingerprint.sha256, “criticals{}.name”

Something to be aware of when writing SPL based around working with JSON, the structure of fieldnames sometimes requires single or double quotes, depending, if you are getting errors or aren’t getting results then this may be why.

We can also easily take a look at which apps will allow us to backup its contents via adb for further analysis, the android:allowBackup attribute in the AndroidManifest file defines whether this is possible for a user who has enabled USB debugging.

Finally some stalkerware apps have SMS as a secondary method of communicating with abusers in the event that certain configuration change criteria are met or can send SMS messages from the victim’s phone posing as the victim. Once again we can immediately see a list of stalkerware with these permissions that we have analyzed.

We’re really only scratching the surface here, any areas that you would want to research around the apps themselves can probably be gleaned at least partially from the SAA data that we have ingested with a minimum of fuss.

Wrapping it all up

The SPL above isn’t advanced by any means, but if we are looking to produce CSV files or simple reports or visualizations then it doesn’t really need to be.

There are undoubtedly simpler ways of doing this and a lot of room for improvement but even just for my own work I can already see that it is going to help me maximise the research that I can get done.

We’ve looked into static analysis here and how we can optimize it, I’ll write a further piece covering dynamic analysis when time permits.

--

--

Cian Heasley

I work in infosec and live in Scotland, I am fascinated by computer security, privacy and the intersection of the internet, technology and human rights.