Parsers

Many Velociraptor artifacts rely on specialized parsing of file formats. This page outlines all the plugins and functions designed to allow the client to parse information for various files.

Simple file formats may be parsed using regular expressions and other generic rules. However some specialized file formats have dedicated parsers. These dedicated parsers are exported into VQL plugins so their results may be used in further queries.

grok

Function

Parse a string using a Grok expression.

This is most useful for parsing syslog style logs (e.g. IIS, Apache logs).

You can read more about GROK expressions here https://www.elastic.co/blog/do-you-grok-grok

Arg Description Type
grok Grok pattern. string (required)
data String to parse. string (required)
patterns Additional patterns. Any
all_captures Extract all captures. bool

olevba

Plugin

Extracts VBA Macros from Office documents.

This plugin parses the provided files as OLE documents in order to recover VB macro code. A single document can have multiple code objects, and each such code object is emitted as a row.

Arg Description Type
file A list of filenames to open as OLE files. list of string (required)
accessor The accessor to use. string
max_size Maximum size of file we load into memory. int64

parse_auditd

Plugin

Parse log files generated by auditd.

Arg Description Type
filename A list of log files to parse. list of string (required)
accessor The accessor to use. string
buffer_size Maximum size of line buffer. int

parse_binary

Function

Parse a binary file into a data structure using a profile.

This plugin extract binary data from strings. It works by applying a profile to the binary string and generating an object from that. Profiles are a json structure describing the binary format. For example a profile might be:

[
  ["StructName", 10, [
     ["field1", 2, "unsigned int"],
     ["field2", 6, "unsigned long long"],
   ]]]
]

The profile is compiled and overlayed on top of the offset specified, then the object is emitted with its required fields.

You can read more about profiles here https://github.com/Velocidex/vtypes

Arg Description Type
filename Binary file to open. string (required)
accessor The accessor to use string
profile Profile to use (see https://github.com/Velocidex/vtypes). string
struct Name of the struct in the profile to instantiate. string (required)
offset Start parsing from this offset int64

parse_csv

Plugin

Parses events from a CSV file.

Parses records from a CSV file. We expect the first row of the CSV file to contain column names. This parser specifically supports Velociraptor’s own CSV dialect and so it is perfect for post processing already existing CSV files.

The types of each value in each column is deduced based on Velociraptor’s standard encoding scheme. Therefore types are properly preserved when read from the CSV file.

For example, downloading the results of a hunt in the GUI will produce a CSV file containing artifact rows collected from all clients. We can then use the parse_csv() plugin to further filter the CSV file, or to stack using group by.

Example

The following stacks the result from a Windows.Applications.Chrome.Extensions artifact:

SELECT count(items=User) As TotalUsers, Name
FROM parse_csv(filename="All Windows.Applications.Chrome.Extensions.csv")
Order By TotalUsers
Group By Name
Arg Description Type
filename CSV files to open list of string (required)
accessor The accessor to use string
auto_headers If unset the first row is headers bool
separator Comma separator string

parse_ese

Plugin

Opens an ESE file and dump a table.

Arg Description Type
file string (required)
accessor The accessor to use. string
table A table name to dump string (required)

parse_ese_catalog

Plugin

Opens an ESE file and dump the schema.

Arg Description Type
file string (required)
accessor The accessor to use. string

parse_evtx

Plugin

Parses events from an EVTX file.

This plugin parses windows events from the Windows Event log files (EVTX).

A windows event typically contains two columns. The EventData contains event specific structured data while the System column contains common data for all events - including the Event ID.

You should probably almost always filter by one or more event ids (using the System.EventID.Value field).

Example

SELECT System.TimeCreated.SystemTime as Timestamp,
       System.EventID.Value as EventID,
       EventData.ImagePath as ImagePath,
       EventData.ServiceName as ServiceName,
       EventData.ServiceType as Type,
       System.Security.UserID as UserSID,
       EventData as _EventData,
       System as _System
FROM watch_evtx(filename=systemLogFile) WHERE EventID = 7045
Arg Description Type
filename A list of event log files to parse. list of string (required)
accessor The accessor to use. string
messagedb A Message database from https://github.com/Velocidex/evtx-data. string

parse_float

Function

Convert a string to a float.

Arg Description Type
string A string to convert to int Any (required)

parse_json

Function

Parse a JSON string into an object.

Note that when VQL dereferences fields in a dict it returns a Null for those fields that do not exist. Thus there is no error in actually accessing missing fields, the column will just return nil.

Arg Description Type
data Json encoded string. string (required)

parse_json_array

Function

Parse a JSON string into an array.

This function is similar to parse_json() but works for a JSON list instead of an object.

Arg Description Type
data Json encoded string. string (required)

parse_json_array

Plugin

Parses events from a line oriented json file.

Arg Description Type
data Json encoded string. string (required)

parse_jsonl

Plugin

Parses a line oriented json file.

Arg Description Type
filename JSON file to open string (required)
accessor The accessor to use string

parse_lines

Plugin

Parse a file separated into lines.

Arg Description Type
filename A list of log files to parse. list of string (required)
accessor The accessor to use. string
buffer_size Maximum size of line buffer. int

parse_mft

Plugin

Scan the $MFT from an NTFS volume.

Arg Description Type
filename A list of event log files to parse. string (required)
accessor The accessor to use. string

parse_ntfs

Function

Parse an NTFS image file.

Arg Description Type
device The device file to open. This may be a full path - we will figure out the device automatically. string (required)
inode The MFT entry to parse in inode notation (5-144-1). string
mft The MFT entry to parse. int64
mft_offset The offset to the MFT entry to parse. int64

parse_ntfs_i30

Plugin

Scan the $I30 stream from an NTFS MFT entry.

Arg Description Type
device The device file to open. This may be a full path - we will figure out the device automatically. string (required)
inode The MFT entry to parse in inode notation (5-144-1). string
mft The MFT entry to parse. int64
mft_offset The offset to the MFT entry to parse. int64

parse_ntfs_ranges

Plugin

Show the run ranges for an NTFS stream.

Arg Description Type
device The device file to open. This may be a full path - we will figure out the device automatically. string (required)
inode The MFT entry to parse in inode notation (5-144-1). string
mft The MFT entry to parse. int64
mft_offset The offset to the MFT entry to parse. int64

parse_pe

Function

Parse a PE file.

Arg Description Type
file The PE file to open. string (required)
accessor The accessor to use. string

parse_records_with_regex

Plugin

Parses a file with a set of regexp and yields matches as records. The file is read into a large buffer. Then each regular expression is applied to the buffer, and all matches are emitted as rows.

The regular expressions are specified in the Go syntax. They are expected to contain capture variables to name the matches extracted.

For example, consider a HTML file with simple links. The regular expression might be:

regex='<a.+?href="(?P<Link>[^"]+?)"'

To produce rows with a column Link.

The aim of this plugin is to split the file into records which can be further parsed. For example, if the file consists of multiple records, this plugin can be used to extract each record, while parse_string_with_regex() can be used to further split each record into elements. This works better than trying to write a more complex regex which tries to capture a lot of details in one pass.

Example

Here is an example of parsing the /var/lib/dpkg/status files. These files consist of records separated by empty lines:

Package: ubuntu-advantage-tools
Status: install ok installed
Priority: important
Section: misc
Installed-Size: 74
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: all
Version: 17
Conffiles:
 /etc/cron.daily/ubuntu-advantage-tools 36de53e7c2d968f951b11c64be101b91
 /etc/update-motd.d/80-esm 6ffbbf00021b4ea4255cff378c99c898
 /etc/update-motd.d/80-livepatch 1a3172ffaa815d12b58648f117ffb67e
Description: management tools for Ubuntu Advantage
 Ubuntu Advantage is the professional package of tooling, technology
 and expertise from Canonical, helping organizations around the world
 manage their Ubuntu deployments.
 .
 Subscribers to Ubuntu Advantage will find helpful tools for accessing
 services in this package.
Homepage: https://buy.ubuntu.com

The following query extracts the fields in two passes. The first pass uses parse_records_with_regex() to extract records in blocks, while using parse_string_with_regex() to further break the block into fields.

SELECT parse_string_with_regex(
   string=Record,
   regex=['Package:\\s(?P<Package>.+)',
     'Installed-Size:\\s(?P<InstalledSize>.+)',
     'Version:\\s(?P<Version>.+)',
     'Source:\\s(?P<Source>.+)',
     'Architecture:\\s(?P<Architecture>.+)']) as Record
   FROM parse_records_with_regex(
     file=linuxDpkgStatus,
     regex='(?sm)^(?P<Record>Package:.+?)\\n\\n')
Arg Description Type
file A list of files to parse. list of string (required)
regex A list of regex to apply to the file data. list of string (required)
accessor The accessor to use. string
buffer_size Maximum size of line buffer (default 64kb). int

parse_recyclebin

Plugin

Parses a $I file found in the $Recycle.Bin

Arg Description Type
filename Files to be parsed. list of string (required)
accessor The accessor to use. string

parse_string_with_regex

Function

Parse a string with a set of regex and extract fields. Returns a dict with fields populated from all regex capture variables.

Arg Description Type
string A string to parse. string (required)
regex The regex to apply. list of string (required)

parse_usn

Plugin

Parse the USN journal from a device.

Arg Description Type
device The device file to open. string (required)
start_offset The starting offset of the first USN record to parse. int64

parse_x509

Function

Parse a DER encoded x509 string into an object.

Arg Description Type
data X509 DER encoded string. string (required)

parse_xml

Function

Parse an XML document into a dict like object.

Arg Description Type
file XML file to open. string (required)
accessor The accessor to use string

parse_yaml

Function

Parse yaml into an object.

Arg Description Type
filename Yaml Filename string (required)
accessor File accessor string

plist

Function

Parse plist file

Arg Description Type
file A list of files to parse. string (required)
accessor The accessor to use. string

plist

Plugin

Parses a plist file.

Arg Description Type
file A list of files to parse. list of string (required)
accessor The accessor to use. string

prefetch

Plugin

Parses a prefetch file.

Arg Description Type
filename A list of event log files to parse. list of string (required)
accessor The accessor to use. string

regex_replace

Function

Search and replace a string with a regexp. Note you can use $1 to replace the capture string.

Arg Description Type
source The source string to replace. string (required)
replace The substitute string. string (required)
re A regex to apply string (required)

rot13

Function

Apply rot13 deobfuscation to the string.

Arg Description Type
string string

split_records

Plugin

Parses files by splitting lines into records.

Arg Description Type
filenames Files to parse. list of string (required)
accessor The accessor to use string
regex The split regular expression (e.g. a comma) string (required)
columns If the first row is not the headers, this arg must provide a list of column names for each value. list of string
first_row_is_headers A bool indicating if we should get column names from the first row. bool
count Only split into this many columns if possible. int

sqlite

Plugin

Opens an SQLite file and run a query against it.

Arg Description Type
file string (required)
accessor The accessor to use. string
query string (required)
args Any

starl

Function

Compile a starlark code block - returns a module usable in VQL

Starl allows python like code to be used with VQL. This helps when we need some small functions with more complex needs. We can use a more powerful language to create small functions to transform certain fields etc.

Example

In the following example we define a Starl code block and compile it into a module. VQL code can then reference any functions defined within it directly.

LET MyCode <= starl(code='''
load("math.star", "math")

def Foo(X):
  return math.sin(X)

''')

SELECT MyCode.Foo(X=32)
FROM scope()
Arg Description Type
code The body of the starlark code. string (required)
key If set use this key to cache the Starlark code block. string
globals Dictionary of values to feed into Starlark environment Any

xor

Function

Apply xor to the string and key.

Arg Description Type
string String to apply Xor string (required)
key Xor key. string (required)