Searching Filenames

One of the most common operations in DFIR is searching for files efficiently. When searching for a file, we may search by filename, file content, size or other properties.

Velociraptor has the glob() plugin to search for files using a glob expression. Glob expressions use wildcards to search the filesystem for matches, and these are the most common tool for searching by filename. As we will see below, the glob() plugin is the foundation for many other artifacts.

The glob() plugin searches the filesystem by glob expression. The following represent the syntax of the glob expressions:

  • A * is a wildcard match (e.g. *.exe matches all files ending with “.exe”)
  • Alternatives are expressed as comma separated strings in {}. For example, *.{exe,dll,sys}
  • Velociraptor supports recursive wildcards: A ** denotes recursive search, e.g. C:\Users\**\*.exe. NOTE: A ** must appear in its own path component to be considered a recursive search: C:\Users\mike** will be interpreted the same as C:\Users\mike*
  • A Recursive search ** can be followed by a number representing the depth of recursion search (default 30).

For example, the following quickly searches all users’ home directories for files with “.exe” extension.

SELECT * FROM glob(globs='C:\\Users\\**\\*.exe')

VQL strings can include a backslash escape sequence. Since Windows paths often use backslashes for path separator you will need to escape the backslashes. Alternatively paths can be written with a forward slash / or a raw VQL string can be used - for example this is a bit easier to write:

SELECT * FROM glob(globs='''C:\Users\**\*.exe''')

The glob() plugin is optimized to visit files on the filesystem as quickly as possible. Therefore if multiple glob expressions are provided, the glob() plugin will combine them into a single expression automatically to reduce filesystem access. It is always better to provide multiple glob expressions than to run the glob() plugin multiple times. For example the following will only make a single pass over the filesystem while searching for both exe and dll files.

SELECT * FROM glob(globs=['C:/Users/**/*.exe',
                          'C:/Users/**/*.dll'])

Velociraptor paths are separated by / or \ into path components. Internally, paths are considered as made up of a list of components. Sometimes path component (e.g. a file or directory) can also contain path separator characters in which case the component is quoted in the path.

To learn more about how paths are used in Velociraptor see Velociraptor Paths

The Glob Root

Glob expressions are meant to be simple to write and to understand. They are not as powerful as a regular expression, with only a few types of wildcard characters allowed (e.g. * or ?). However, what if we wanted to literally match a directory which also contained a wildcard character?

This problem is encountered quite often: Normally we know an exact directory path and simply want to search beneath this directory using glob. Consider a directory like C:\Users\Administrator\{123-45-65} - this is common as directories are often named as GUID - especially in the registry.

If we used the above in a glob expression, the glob() plugin will assume {123-45-65} is an alternative wild card. It will therefore only match a directory exactly named 123-45-65. We can therefore use the root parameter to tell glob() to only start searching from this exact directory name:

SELECT *
FROM glob(globs='**', root='''C:\Users\Administrator\{123-45-65}''')

Note that the root path is not a glob expression but represents exactly a single path forming the directory under which we start searching. Similarly the glob parameters now refer to wildcard matches under that root directory.

Glob results

The glob() plugin returns rows with several columns. As usual, the best way to see what a plugin returns is to click the Raw Response JSON button on the results table.

Glob output
Glob output

Some of the more important columns available are

  1. The OSPath is the complete path to the matching file, whereas the Name is just the filename.
  2. The Mtime, Atime, Ctime and Btime are timestamps of the file.
  3. The Data column is a free form dictionary containing key/value data about the file. This data depends on the accessor used.
  4. IsDir, IsLink and Mode indicate what kind of file matched. (Mode.String can present the mode in a more human readable way).
  5. Finally the glob() plugin reports which glob expression matched this particular file. This is handy when you provided a list of glob expressions to the plugin.

Filesystem accessors

Glob is a very useful concept to search hierarchical trees because wild cards are easy to use and powerful. Sometimes we might want to use a glob expression to look for other things that are not files, but also have a hierarchical structure. For example, the registry is organized in a similar way to a filesystem, so maybe we can use a glob expression to search the registry?

Velociraptor supports direct access to many different data sources with such hierarchical trees via accessors (Accessors are essentially filesystem access drivers). Some common accessors are

  • file - uses OS APIs to access files.
  • ntfs - uses raw NTFS parsing to access low level files
  • registry - uses OS APIs to access the windows registry

When no accessor is specified, Velociraptor uses an automatic accessor (called auto): On Windows, the auto accessor uses the file accessor to attempt to read the file using the OS APIs, but if the file is locked (or it received permission denied errors), Velociraptor automatically falls back to the ntfs accessor in order to read the file from raw disk clusters. This allows Velociraptor to transparently bypass any OS level restrictions on reading files (such as filesystem permissions or some filter drivers that block access to files based on other rules - sometimes found in local security software).

The registry accessor

This accessor uses the OS API to access the registry hives. The top level directory is a list of the common hives (e.g. HKEY_USERS). The accessor creates a registry abstraction to make it appear as a filesystem:

  • Top level consists of the major hives
  • Values appear as files, Keys appear as directories
  • The Default value in a key is named “@”
  • Since reading the registry value is very quick anyway, the registry accessor makes the Value’s content available inside the Data attribute.
  • Can escape components with / using quotes HKEY_LOCAL_MACHINE\Microsoft\Windows\"http://www.microsoft.com/"

Raw registry parsing

In the previous section we looked for a key in the HKEY_CURRENT_USER hive. Any artifacts looking in HKEY_USERS using the Windows API are limited to the set of users currently logged in! We need to parse the raw hive to reliably recover all users.

Each user’s setting is stored in C:\\Users\\<name>\\ntuser.dat which is a raw registry hive file format. We can parse this file using the raw_reg accessor.

When we need to parse a key or value using the raw registry we need to provide it with 3 pieces of information:

  1. The Registry hive file to parse (path)
  2. The Accessor to open that file (scheme)
  3. The Key or Value path within the registry file to open (fragment)

Since the accessor can only receive a single string (file path), we pass these three pieces of information using a URL notation.

Do not attempt to build the URL using string concatenation because this will fail to escape properly. Always use the url() VQL function to build the URL for use by the raw_reg accessor.

Raw Registry
Raw Registry

In the above example, we specify to the glob() plugin that we want to open the raw registry file at C:\\Users\\Mike\\ntuser.dat and glob for the pattern /* within it.

Note that the FullPath returned by the accessor is also in URL notation. This is done so that you can feed the FullPath directly to any plugin that uses filenames without conversion - since the raw registry accessor can read the urls it is producing.

If you need to extract the key path within the registry hive, you can use the url() function with the parse argument to parse the url again. The Fragment field represents the key path.

url(scheme='file', path='C:/Users/test/ntuser.dat', fragment='/**/Run/*')

file:///C:/Users/test/ntuser.dat#/**/Run/*

Example: Find autorun files from ntuser.dat

Let’s combine the above query to search all Run keys in all user’s ntuser.dat files.

SELECT * FROM foreach(
row={
   SELECT FullPath AS NTUserPath FROM glob(globs="C:/Users/*/ntuser.dat")
}, query={
   SELECT NTUserPath, url(parse=FullPath).Fragment AS Value, Mtime, Data.value
   FROM glob(
       globs=url(scheme="file", path=NTUserPath,
                 fragment="SOFTWARE/Microsoft/Windows/**/Run/*"),
       accessor="raw_reg")
})

We glob for ntuser.dat files in all user’s home directory, then foreach one of those, we search the raw registry hive for values under the Run or RunOnce key.

Raw Registry Run keys
Raw Registry Run keys

The “data” accessor

VQL contains many plugins that work on files. Sometimes we load data into memory as a string. It is handy to be able to use all the normal file plugins with literal string data - this is what the data accessor is for - when the data accessor is used, it creates an in-memory file with the content of the file being the string that is passed as the filename.

This allows us to use strings in plugins like parse_csv()