View on GitHub

OSXCollector

A forensic evidence collection & analysis toolkit for OSX

Download this project as a .zip file Download this project as a tar.gz file

Build Status

OSXCollector Manual

OSXCollector is a forensic evidence collection & analysis toolkit for OSX.

Forensic Collection

The collection script runs on a potentially infected machine and outputs a JSON file that describes the target machine. OSXCollector gathers information from plists, SQLite databases and the local file system.

Forensic Analysis

Armed with the forensic collection, an analyst can answer the question like:

Yelp automates the analysis of most OSXCollector runs converting OSXCollector output into an easily readable and actionable summary of just the suspicious stuff.

Performing Collection

osxcollector.py is a single Python file that runs without any dependencies on a standard OSX machine. This makes it really easy to run collection on any machine - no fussing with brew, pip, config files, or environment variables. Just copy the single file onto the machine and run it.

sudo osxcollector.py is all it takes.

$ sudo osxcollector.py
Wrote 35394 lines.
Output in osxcollect-2014_12_21-08_49_39.tar.gz

The JSON output of the collector, along with some helpful files like system logs, has been bundled into a .tar.gz for hand-off to an analyst.

osxcollector.py also has a lot of useful options to change how collection works:

Details of Collection

The collector outputs a .tar.gz containing all the collected artifacts. The archive contains a JSON file with the majority of information. Additionally, a set of useful logs from the target system logs are included.

Common Keys

Every Record

Each line of the JSON file records 1 piece of information. There are some common keys that appear in every JSON record:

File Records

For records representing files there are a bunch of useful keys:

For records representing downloaded files:

SQLite Records

For records representing a row of a SQLite database:

For records that represent data associated with a specific user:

Timestamps

OSXCollector attempts to convert timestamps to human readable date/time strings in the format YYYY-mm-dd hh:MM:ss. It uses heuristics to automatically identify various timestamps:

Sections

version section

The current version of OSXCollector.

system_info section

Collects basic information about the system:

kext section

Collects the Kernel extensions from:

startup section

Collects information about the LaunchAgents, LaunchDaemons, ScriptingAdditions, StartupItems and other login items from:

More information about the Max OS X startup can be found here: http://www.malicious-streams.com/article/Mac_OSX_Startup.pdf

applications section

Hashes installed applications and gathers install history from:

quarantines section

Quarantines are basically the info necessary to show the 'Are you sure you wanna run this?' when a user is trying to open a file downloaded from the Internet. For some more details, checkout the Apple Support explanation of Quarantines: http://support.apple.com/kb/HT3662

This section collects also information from XProtect hash-based malware check for quarantines files. The plist is at: /System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/XProtect.plist

XProtect also add minimum versions for Internet Plugins. That plist is at: /System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/XProtect.meta.plist

downloads section

Hashes all users' downloaded files from:

chrome section

Collects following information from Google Chrome web browser:

This data is extracted from ~/Library/Application Support/Google/Chrome/Default

firefox section

Collects information from the different SQLite databases in a Firefox profile:

This information is extracted from ~/Library/Application Support/Firefox/Profiles

For more details about Firefox profile folder see http://kb.mozillazine.org/Profile_folder_-_Firefox

safari section

Collects information from the different plist and SQLite databases in a Safari profile:

accounts section

Collects information about users' accounts:

mail section

Hashes files in the mail app directories:

full_hash section

Hashes all the files on disk. All of 'em. This does not run by default. It must be triggered with:

$ sudo osxcollector.py -s full_hash

Basic Manual Analysis

Forensic analysis is a bit of art and a bit of science. Every analyst will see a bit of a different story when reading the output from OSXCollector. That's part of what makes analysis fun.

Generally, collection is performed on a target machine because something is hinky: anti-virus found a file it doesn't like, deep packet inspect observed a callout, endpoint monitoring noticed a new startup item. The details of this initial alert - a file path, a timestamp, a hash, a domain, an IP, etc. - that's enough to get going.

Timestamps

Simply greping a few minutes before and after a timestamp works great:

$ cat INCIDENT32.json | grep '2014-01-01 11:3[2-8]'

Browser History

It's in there. A tool like jq can be very helpful to do some fancy output:

$ cat INCIDENT32.json | grep '2014-01-01 11:3[2-8]' | jq 'select(has("url"))|.url'

A Single User

$ cat INCIDENT32.json | jq 'select(.osxcollector_username=="ivanlei")|.'

Automated Analysis

The osxcollector.output_filters package contains filters that process and transform the output of OSXCollector. The goal of filters is to make it easy to analyze OSXCollector output.

Each filter has a single purpose. They do one thing and they do it right.

Running Filters in a VirtualEnv

Unlike osxcollector.py filters have dependencies that aren't already installed on a new Mac. The best solution for ensure dependencies can be found is to use virtualenv.

To setup a virtualenv for the first time use:

$ sudo pip install virtualenv
$ virtualenv --system-site-packages venv_osxcollector
$ source ./venv_osxcollector/bin/activate
$ pip install -r ./requirements-dev.txt

Filter Configuration

Many filters require configuration, like API keys or details on a blacklist. The configuration for filters is done in a YAML file. The file is named osxcollector.yaml. The filter will look for the config file in:

A sample config is included. Make a copy and then modify if for yourself:

$ cp osxcollector.yaml.example osxcollector.yaml
$ emacs osxcollector.yaml

Basic Filters

Using combinations of these basic filters, an analyst can figure out a lot of what happened without expensive tools, without threat feeds or fancy APIs.

FindDomainsFilter

osxcollector.output_filters.find_domains.FindDomainsFilter attempts to find domain names in OSXCollector output. The domains are added to the line with the key osxcollector_domains.

FindDomainsFilter isn't too useful on it's own but it's super powerful when chained with filters like FindBlacklistedFilter and or osxcollector.output_filters.virustotal.lookup_domains.LookupDomainsFilter.

To run and see lines where domains have been added try:

$ python -m osxcollector.output_filters.find_domains -i RomeoCredible.json | \
    jq 'select(has("osxcollector_domains"))'

Usage:

$ python -m osxcollector.output_filters.find_domains -h
usage: find_domains.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
FindBlacklistedFilter

osxcollector.output_filters.find_blacklisted.FindBlacklistedFilter reads a set of blacklists from the osxcollector.yaml and marks any lines with values on the blacklist. The BlacklistFilter is flexible and allows you to compare the OSXCollector output against multiple blacklists.

You really should create blacklists for domains, file hashes, file names, and any known hinky stuff.

Configuration Keys:

If you want to find blacklisted domains, you will have to use the find_domains filter to pull the domains out first. To see lines matching a specific blacklist named domains try:

$ python -m osxcollector.output_filters.find_domains -i RiddlerBelize.json | \
    python -m osxcollector.output_filters.find_blacklisted | \
    jq 'select(has("osxcollector_blacklist")) | \
        select(.osxcollector_blacklist | keys[] | contains("domains"))'

Usage:

$ python -m osxcollector.output_filters.find_blacklisted -h
usage: find_blacklisted.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
RelatedFilesFilter

osxcollector.output_filters.related_files.RelatedFilesFilter takes an initial set of file paths, names, or terms. It breaks this input into individual file and directory names and then searches for these terms across the entire OSXCollector output. The filter is smart and ignores common terms like bin or Library as well as ignoring user names.

This filter is great for figuring out how evil_invoice.pdf landed up on a machine. It'll find browser history, quarantines, email messages, etc. related to a file.

To run and see related lines try:

$ python -m osxcollector.output_filters.related_files -i CanisAsp.json -f '/foo/bar/baz' -f 'dingle' | \
    jq 'select(has("osxcollector_related")) | \
        select(.osxcollector_related | keys[] | contains("files"))'

Usage:

$ python -m osxcollector.output_filters.related_files -h
usage: related_files.py [-h] [-f FILE_TERMS] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.

RelatedFilesFilter:
  -f FILE_TERMS, --file-term FILE_TERMS
                        [OPTIONAL] Suspicious terms to use in pivoting through
                        file names. May be specified more than once.
ChromeHistoryFilter

osxcollector.output_filters.chrome.sort_history.SortHistoryFilter builds a really nice Chrome browser history sorted in descending time order. This output is comparable to looking at the history tab in the browser but actually contains more info. The core_transition and page_transition keys explain whether the user got to the page by clicking a link, through a redirect, a hidden iframe, etc.

To run and see Chrome browser history:

$ python -m osxcollector.output_filters.chrome.sort_history -i SirCray.json | \
    jq 'select(.osxcollector_browser_history=="chrome")'

This is great mixed with a grep in a certain time window, like maybe the 5 minutes before that hinky download happened.

$ python -m osxcollector.output_filters.chrome.sort_history -i SirCray.json | \
    jq -c 'select(.osxcollector_browser_history=="chrome")' | \
    egrep '2015-02-02 20:3[2-6]'

Usage:

$ python -m osxcollector.output_filters.chrome.sort_history -h
usage: sort_history.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
FirefoxHistoryFilter

osxcollector.output_filters.firefox.sort_history.SortHistoryFilter builds a really nice Firefox browser history sorted in descending time order. It's a lot like the ChromeHistoryFilter.

To run and see Firefox browser history:

$ python -m osxcollector.output_filters.firefox.sort_history -i CousingLobe.json | \
    jq 'select(.osxcollector_browser_history=="firefox")'

Usage:

$ python -m osxcollector.output_filters.firefox.sort_history -h
usage: sort_history.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
ChromeExtensionsFilter

osxcollector.output_filters.chrome.find_extensions.FindExtensionsFilter looks for extensions in the Chrome JSON files.

To run and see Chrome extensions:

$ python -m osxcollector.output_filters.chrome.find_extensions -i MotherlyWolf.json | \
    jq 'select(.osxcollector_section=="chrome" and
               .osxcollector_subsection=="extensions")'

Usage:

$ python -m osxcollector.output_filters.chrome.find_extensions -h
usage: find_extensions.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
FirefoxExtensionsFilter

osxcollector.output_filters.firefox.find_extensions.FindExtensionsFilter looks for extensions in the Firefox JSON files.

To run and see Firefox extensions:

$ python -m osxcollector.output_filters.firefox.find_extensions -i FlawlessPelican.json | \
    jq 'select(.osxcollector_section=="firefox" and
               .osxcollector_subsection=="extensions")'

Usage:

$ python -m osxcollector.output_filters.firefox.find_extensions -h
usage: find_extensions.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.

Threat API Filters

By taking the output of OSXCollector and looking up further info with OpenDNS and VirusTotal APIs, Yelp enhances the output with useful info. Some of these APIs aren't free but they are useful.

Using these filters as examples, it would be possible to integrate with additional free or premium threat APIs. osxcollector.output_filters.base_filters.threat_feed.ThreatFeedFilter has most of the plumbing for hooking up to arbitrary APIs.

OpenDNS RelatedDomainsFilter

osxcollector.output_filters.opendns.related_domains.RelatedDomainsFilter takes an initial set of domains and IPs and then looks up domains related to them with the OpenDNS Umbrella API.

Often an initial alert contains a domain or IP your analysts don't know anything about. However, by gathering the 2nd generation related domains, familiar friends might appear. When you're lucky, those related domains land up being the download source for some downloads you might have overlooked.

The filter will ignore domains if they are in the blacklist named domain_whitelist. This helps to reduce churn and false positives.

Run it as and see what it found:

$ python -m osxcollector.output_filters.find_domains -i NotchCherry.json | \
    python -m osxcollector.output_filters.opendns.related_domains \
           -d dismalhedgehog.com -d fantasticrabbit.org \
           -i 128.128.128.28 | \
    jq 'select(has("osxcollector_related")) |
        select(.osxcollector_related | keys[] | contains("domains"))'

The results will look something like:

{
   'osxcollector_related': {
       'domains': {
           'domain_in_line.com': ['dismalhedgehog.com'],
           'another.com': ['128.128.128.28']
       }
    }
}

Usage:

$ python -m osxcollector.output_filters.opendns.related_domains -h
usage: related_domains.py [-h] [-d INITIAL_DOMAINS] [-i INITIAL_IPS]
                          [--related-domains-generations GENERATIONS]
                          [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.

opendns.RelatedDomainsFilter:
  -d INITIAL_DOMAINS, --domain INITIAL_DOMAINS
                        [OPTIONAL] Suspicious domains to use in pivoting. May
                        be specified more than once.
  -i INITIAL_IPS, --ip INITIAL_IPS
                        [OPTIONAL] Suspicious IP to use in pivoting. May be
                        specified more than once.
  --related-domains-generations GENERATIONS
                        [OPTIONAL] How many generations of related domains to
                        lookup with OpenDNS
OpenDNS LookupDomainsFilter

osxcollector.output_filters.opendns.lookup_domains.LookupDomainsFilter lookups domain reputation and threat information with the OpenDNS Umbrella API. It adds information about suspicious domains to the output lines.

The filter uses a heuristic to determine what is suspicious. It can create false positives but usually a download from a domain marked as suspicious is a good lead.

Run it and see what was found:

$ python -m osxcollector.output_filters.find_domains -i GladElegant.json | \
    python -m osxcollector.output_filters.opendns.lookup_domains | \
    jq 'select(has("osxcollector_opendns"))'

Usage:

$ python -m osxcollector.output_filters.opendns.lookup_domains -h
usage: lookup_domains.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
VirusTotal LookupDomainsFilter

osxcollector.output_filters.virustotal.lookup_domains.LookupDomainsFilter lookups domain reputation and threat information with the VirusTotal API. It adds information about suspicious domains to the output lines. It's a lot like the OpenDNS filter of the same name.

The filter uses a heuristic to determine what is suspicious. It can create a lot of false positives but also provides good leads.

Run it and see what was found:

$ python -m osxcollector.output_filters.find_domains -i PippinNightstar.json | \
    python -m osxcollector.output_filters.virustotal.lookup_domains | \
    jq 'select(has("osxcollector_vtdomain"))'

Usage:

$ python -m osxcollector.output_filters.virustotal.lookup_domains -h
usage: lookup_domains.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
VirusTotal LookupHashesFilter

osxcollector.output_filters.virustotal.lookup_hashes.LookupHashesFilter lookups hashes with the VirusTotal API. This basically finds anything VirusTotal knows about which is a huge time saver. There's pretty much no false positives here, but there's also no chance of detecting unknown stuff.

Run it and see what was found:

$ python -m osxcollector.output_filters.virustotal.lookup_hashes -i FungalBuritto.json | \
    jq 'select(has("osxcollector_vthash"))'

Usage:

$ python -m osxcollector.output_filters.virustotal.lookup_hashes -h
usage: lookup_hashes.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
VirusTotal LookupURLsFilter

osxcollector.output_filters.virustotal.lookup_hashes.LookupURLsFilter lookups URLs with the VirusTotal API. As this only looks up the reports, it may not find the reports for some unknown URLs.

Run it and see what was found:

$ python -m osxcollector.output_filters.virustotal.lookup_urls -i WutheringLows.json | \
    jq 'select(has("osxcollector_vturl"))'

Usage

$ python -m osxcollector.output_filters.virustotal.lookup_urls -h
usage: lookup_urls.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.
Maximum resources per request

Both VirusTotal LookupHashesFilter and LookupURLsFilter can save time by including in a single API request the reports for the multiple resources (hashes or URLs). As the number of the maximum resources in a request depends on whether you are using a Public or Private API key it is configurable in osxcollector.yaml file in virustotal section:

resources_per_req: 4
ShadowServer LookupHashesFilter

osxcollector.output_filters.shadowserver.lookup_hashes.LookupHashesFilter lookups hashes with the ShadowServer bin-test API. This is sort of the opposite of a VirusTotal lookup and returns results when it sees the hashes of known good files. This helps raise confidence that a file is not malicious.

Run it and see what was found:

$ python -m osxcollector.output_filters.shadowserver.lookup_hashes -i ArkashKobiashi.json | \
    jq 'select(has("osxcollector_shadowserver"))'

Usage:

$ python -m osxcollector.output_filters.shadowserver.lookup_hashes -h
usage: lookup_hashes.py [-h] [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.

AnalyzeFilter - The One Filter to Rule Them All

osxcollector.output_filters.analyze.AnalyzeFilter is Yelp's one filter to rule them all. It chains all the previous filters into one monster analysis. The results, enhanced with blacklist info, threat APIs, related files and domains, and even pretty browser history is written to a new output file.

Then Very Readable Output Bot takes over and prints out an easy-to-digest, human-readable, nearly-English summary of what it found. It's basically equivalent to running:

$ python -m osxcollector.output_filters.chrome.find_extensions.FindExtensionsFilter -i SlickApocalypse.json | \
    python -m osxcollector.output_filters.firefox.find_extensions.FindExtensionsFilter | \
    python -m osxcollector.output_filters.find_domains | \
    python -m osxcollector.output_filters.shadowserver.lookup_hashes | \
    python -m osxcollector.output_filters.virustotal.lookup_hashes | \
    python -m osxcollector.output_filters.find_blacklisted | \
    python -m osxcollector.output_filters.related_files | \
    python -m osxcollector.output_filters.opendns.related_domains | \
    python -m osxcollector.output_filters.opendns.lookup_domains | \
    python -m osxcollector.output_filters.virustotal.lookup_domains | \
    python -m osxcollector.output_filters.chrome_history | \
    python -m osxcollector.output_filters.firefox_history | \
    tee analyze_SlickApocalypse.json | \
    jq 'select(false == has("osxcollector_shadowserver")) |
        select(has("osxcollector_vthash") or
               has("osxcollector_vtdomain") or
               has("osxcollector_opendns") or
               has("osxcollector_blacklist") or
               has("osxcollector_related"))'

and then letting a wise-cracking analyst explain the results to you. The Very Readable Output Bot even suggests hashes and domains to add to blacklists.

This thing is the real deal and our analysts don't even look at OSXCollector output until after they've run the AnalyzeFilter.

Run it as:

$ python -m osxcollector.output_filters.analyze -i FullMonty.json

Usage:

$ python -m osxcollector.output_filters.analyze -h
usage: analyze.py [-f FILE_TERMS] [-d INITIAL_DOMAINS] [-i INITIAL_IPS]
                  [--related-domains-generations GENERATIONS] [-h] [--readout]
                  [--no-opendns] [--no-virustotal] [--no-shadowserver] [-M]
                  [--show-signature-chain] [--show-browser-ext]
                  [--input-file INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        [OPTIONAL] Path to OSXCollector output to read.
                        Defaults to stdin otherwise.

RelatedFilesFilter:
  -f FILE_TERMS, --file-term FILE_TERMS
                        [OPTIONAL] Suspicious terms to use in pivoting through
                        file names. May be specified more than once.

opendns.RelatedDomainsFilter:
  -d INITIAL_DOMAINS, --domain INITIAL_DOMAINS
                        [OPTIONAL] Suspicious domains to use in pivoting. May
                        be specified more than once.
  -i INITIAL_IPS, --ip INITIAL_IPS
                        [OPTIONAL] Suspicious IP to use in pivoting. May be
                        specified more than once.
  --related-domains-generations GENERATIONS
                        [OPTIONAL] How many generations of related domains to
                        lookup with OpenDNS

AnalyzeFilter:
  --readout             [OPTIONAL] Skip the analysis and just output really
                        readable analysis
  --no-opendns          [OPTIONAL] Don\'t run OpenDNS filters
  --no-virustotal       [OPTIONAL] Don\'t run VirusTotal filters
  --no-shadowserver     [OPTIONAL] Don\'t run ShadowServer filters
  -M, --monochrome      [OPTIONAL] Output monochrome analysis
  --show-signature-chain
                        [OPTIONAL] Output unsigned startup items and kexts.
  --show-browser-ext    [OPTIONAL] Output the list of installed browser
                        extensions.

Contributing to OSXCollector

We encourage you to extend the functionality of OSXCollector to suit your needs.

Testing OSXCollector

A collection of tests for osxcollector is provided under the tests directory. In order to run these tests you must install tox:

$ sudo pip install tox

To run this suit of tests, cd into osxcollector and enter:

$ make test

Development Tips

The functionality of OSXCollector is stored in a single file: osxcollector.py. The collector should run on a naked install of OS X without any additional packages or dependencies.

Ensure that all of the OSXCollector tests pass before editing the source code. You can run the tests using: make test

After making changes to the source code, run make test again to verify that your changes did not break any of the tests.

License

This work is licensed under the GNU General Public License and a derivation of https://github.com/jipegit/OSXAuditor

Resources

Want to learn more about OS X forensics?

A couple of other interesting tools: