Troubleshooting and Debugging

Troubleshooting deployments

Sometimes things don’t work when you first try them. This page will go through the common issues people find when deploying Velociraptor clients and the steps needed to debug them.

Server fails to start

If the server fails to start, you can try to start it by hand to see any logs or issues. Typically the Linux service will report something unhelpful such as:

# service velociraptor_server status
● velociraptor_server.service - Velociraptor linux amd64
    Loaded: loaded (/etc/systemd/system/velociraptor_server.service; enabled; vendor preset: enabled)
    Active: activating (auto-restart) (Result: exit-code) since Fri 2021-12-31 15:32:58 AEST; 1min 1s ago
   Process: 3561364 ExecStart=/usr/local/bin/velociraptor --config /etc/velociraptor/server.config.yaml frontend (code=exited, status=1/FAILURE)
  Main PID: 3561364 (code=exited, status=1/FAILURE)

You can usually get more information from the system log files, usually /var/log/syslog. Alternative you can try to start the service by hand and see any issues on the console.

First change to the Velociraptor user and then start the service as that user.

# sudo -u velociraptor bash
$ velociraptor frontend -v
Dec 31 15:47:18 devbox velociraptor[3572509]: velociraptor.bin: error: frontend: loading config file: failed to acquire target io.Writer: failed to create a new file /mnt/data/logs/Velociraptor_debug.log.202112270000: failed to open file /mnt/data/logs/Velociraptor_debug.log.202112270000: open /mnt/data/logs/Velociraptor_debug.log.202112270000: permission denied

In this case, Velociraptor can not start because it can not write on its logs directory. Other errors might be disk full or various permission denied problems.

Because Velociraptor normally runs as a low privileged user, it needs to maintain file ownership as the velociraptor user. Sometimes permissions change by accident (usually this happens by running velociraptor as root and interacting with the file store - you should always change to the velociraptor user before interacting with the server).

It is worth checking file permissions (using ls -l) and recursively returning file ownership back to the velociraptor user (using the command chown -R velociraptor:velociraptor /path/to/filestore/)

Debugging client communications

If the client does not appear to properly connect to the server, the first thing is to run it manually (using the velociraptor --config client.config.yaml client -v command):

Running the client manually
Running the client manually

In the above example, I ran the client manually with the -v switch. I see the client starting up and immediately trying to connect to its URL (in this case https://test.velocidex-training.com/) However this fails and the client will wait for a short time before retrying to connect again.

Testing connectivity with curl
Testing connectivity with curl

A common problem here is network filtering making it impossible to reach the server. You can test this by simply running curl with the server’s URL.

Once you enable connectivity, you might encounter another problem

Captive portal interception
Captive portal interception

The Unable to parse PEM message indicates that the client is trying to fetch the server.pem file but it is not able to validate it. This often happens with captive portal type of proxies which interfere with the data transferred. It can also happen if your DNS setting point to a completely different server.

We can verify the server.pem manually by using curl (note that when using self signed mode you might need to provide curl with the -k flag to ignore the certificate errors):

Fetching the server certificate
Fetching the server certificate

Note that the server.pem is always signed by the velociraptor internal CA in all deployment modes (even with lets encrypt). You can view the certificate details by using openssl:

curl https://test.velocidex-training.com/server.pem | openssl x509 -text

If your server certificate has expired, the client will refuse to connect to it. To reissue the server certificate simply recreate the server configuration file (after suitably backing up the previous config file):

velociraptor --config server.config.yaml config rotate_key > new_server.config.yaml

Depending on which user invoked the Velociraptor binary, you may need to alter the permissions of the new server configuration file.

For example:

chmod 600 new_server.config.yaml
chown velociraptor:velociraptor new_server.config.yaml

From here, you will need to move the updated server configuration into the appropriate location.

The above step was able to use the internal Velociraptor CA to reissue the server certificate (which is normally issued for 1 year), allowing us to rotate the certificate.

Currently there is no way to update the CA certificate without redeploying new clients (the CA certificate is embedded in the client config file). When generating the config file initially, the CA certificate is created with a 10 year validity.

Debugging Velociraptor

Velociraptor is a powerful program with a lot of functionality. Sometimes it is important to find out what is happening inside Velociraptor and if it doing what is expected. This is important for debugging or even just for understanding what is happening.

To see the inner workings of Velociraptor we can collect profiles of various aspects of the program. These profiles exist regardless of if Velociraptor is used in as a client or server or even an offline collector.

You can read more about profiling in Profiling the Beast

Starting the debug server

When provided with the --debug flag, Velociraptor will start the debug server on port 6060 (use --debug_port to change it). By default the debug server will only bind to localhost so you will need to either tunnel the port or use a local browser to connect to it.

Viewing the debug server
Viewing the debug server

The debug server has a number of different profiles and new ones will be introduced, so below we just cover some of the most useful profiles you can view.

Notebook workers

Notebooks are very useful feature of the server allowing for complex postprocessing of collected data. Sometimes these queries are very large and take a long time to run. To limit the amount of resources the queries can take on the server, Velociraptor only creates a limited number of notebook workers (by default 5).

Inspecting the notebook workers
Inspecting the notebook workers

Currently running queries

This view shows the queries currently running in this process. For example queries will run as part of the notebook evaluation, currently installed event queries or in the case of the offline collector, currently collecting artifacts.

Inspecting the currently running queries
Inspecting the currently running queries

You can also see all recent queries (even the ones that have completed already). This helps to understand what exactly the client is doing.

ETW Subsystem

This profile shows the current state of the ETW subsystem on Windows. We can see what providers Velociraptor is subscribed to, how many queries are currently watching that provider, and how many events were received from the provider.

Inspecting the ETW subsystem
Inspecting the ETW subsystem

Go profiling information

For even more low level view of the program execution, we can view the Built in Go Profiles which include detailed heap allocation, goroutine information and can capture a CPU profile for closer inspection.

This type of information is critical for developers to understand what the code is doing, and you should forward it in any bug reports or discussions to help the Velociraptor developer team.

Debugging the offline collector

The offline collector is a one shot collector which simply runs, collects several preconfigured artifacts into a zip file and terminates.

Sometimes the collector may take a long time or use too much memory. In this case you might want to gain visibility into what its doing.

You can start the offline collector by adding the --debug flags to its execution in a similar way to above.

Collector_velociraptor-v0.7.1-windows-amd64.exe -- --debug --debug_port 6061

Inspecting the ETW subsystem
Inspecting the ETW subsystem

Note that the additional -- is required to indicate that the additional parameters are not considered part of the command line (the offline collector requires running with no parameters).

The above will start the debug server on port 6061. You can then download goroutine, heap allocation and other profiles from the debug server and forward these to the Velociraptor team to resolve any issues.

Debugging a remote client

The previous section described how to bring up the debug server by providing the --debug commandline, but existing clients are not normally already running with this flag. Often we are trying to collect an artifact from a remote client and we want to see what is actually happening in the client process itself.

We can do this by collecting the Generic.Client.Profile from the client directly. This artifact has access to the same data exposed through the debug server, but does not require the debug flag to be enabled in advance.

Collecting the client profile
Collecting the client profile

By default the artifact collects the most useful information developers require, but you can customize the artifact parameter to collect more detailed information if required.

You can share the result of the collection by exporting it to a zip file and sharing with the development team on Discord or GitHub issues.

Exporting the client profile
Exporting the client profile

Enabling a trace

While collecting the profile at any time is useful, it is sometimes hard to catch the problem on the client at just the right moment. For example, if a particular query causes a memory leak or performance issues, by the time you can schedule the Generic.Client.Profile artifact, the client may have already restarted or is too busy to actually collect the artifact.

In this case it is useful to enable a trace during the collection of another artifact. This setting will cause the client to take profile snapshots at specified intervals during query execution and automatically upload them to the server.

Enabling periodic trace during artifact collection
Enabling periodic trace during artifact collection

This setting will upload a zip file containing critical profile information every 10 seconds during query execution. This information is useful to see the memory and resource footprint as the query progresses as well as the logs from the client.

Viewing the trace log
Viewing the trace log

Inspecting client’s log

One of the first troubleshooting steps described above is to run the client with the -v flag to print client logs to the screen. This helps to identify transient network issues or identify when a client restarted.

Normally however, the client does not write its logs to disk. This is done to prevent information leakage risks - the client’s log may contain sensitive information like collected artifacts.

It is possible to tell the client to log to an encrypted local storage file. This allows us to collect the file from any client later and decrypt it on the server while not creating an information leak risk.

To enable local client logging, you can create a new label group (e.g. logged) and then assign the Generic.Client.LocalLogs client monitoring artifact to this group. This allows you to begin logging on any client by simply labeling it into the group.

Configuring local client logs
Configuring local client logs

Logs will be written continuously into the specified file on the endpoint. The file is encrypted and can only be decrypted on the server but the client can append logging information, even after a reboot.

When we want to inspect the log file, we simply collect it from the endpoint using the Generic.Client.LocalLogsRetrieve artifact.

Retrieving the encrypted log file
Retrieving the encrypted log file

The notebook tab will automatically decrypt the logs and display them in a table.

Decrypting the local log file
Decrypting the local log file

Debugging the server

It is also possible to collect the profile from the server without the use of the --debug flag using the Server.Monitor.Profile artifact. This is the server equivalent of the Generic.Client.Profile artifact.

Collecting metrics

When Velociraptor is run in production it is often necessary to build dashboards to monitor the server’s characteristics, such as memory user, requests per second etc.

Velociraptor exports a lot of important metrics using the standard Prometheus library. This information may be scraped from the server’s monitoring port (by default http://127.0.0.1:8003/metrics). You can change the port and bind address for the metrics server using the Monitoring.bind_port and Monitoring.bind_address setting.

You can either manually see program metrics using curl or configure an external system like Grafana or DataDog to scrape these metrics.

curl http://127.0.0.1:8003/metrics | less

We recommend that proper monitoring be implemented in production systems.