Monitor telemetry with Prometheus & Grafana
Challenge
It is important to gain operational and usage insight into a running Vault cluster for the purposes of understanding performance and assisting with proactive incident response, along with understanding business workloads, and use cases.
Operators and security practitioners need to be aware of conditions that can indicate potential performance implications to production users or security issues which require immediate attention.
Solution
Vault provides rich operational telemetry metrics that can be consumed by popular solutions for monitoring and alerting on key operational conditions.
One of the many ways that you can monitor Vault telemetry includes using the monitoring and alerting toolkit Prometheus, and visualizing the metric data with the Grafana observability platform.
Vault returns telemetry metrics from the /sys/metrics endpoint, and adding the format=prometheus
parameter will result in Prometheus formatted metrics.
Scenario introduction
In this scenario, you will use Docker containers to deploy a Vault server, Prometheus monitoring, and a Grafana dashboard.
You will configure Vault to enable Prometheus metrics, and deploy the containers using the command line in a terminal session. You will also use the Grafana web interface to create a dashboard for visualizing metrics.
Begin the scenario by preparing your environment.
Prerequisites
To perform the steps in the hands on scenario, you need:
Vault 1.8 or later binary installed in your system path; the Community Edition can be used for this tutorial.
- The Install Vault tutorial can guide you through installation.
Prepare host environment
Create a temporary directory and some subdirectories to contain all of the work you will do in this scenario, and assign its path to the environment variable LEARN_VAULT
.
Create a Docker network named learn-vault; this network will be used by all containers in the scenario.
With the environment preparation complete, you are ready to start the Vault container.
Vault container
You will start a minimally configured Vault server using the filesystem storage backend and some initial configuration contained in the vault-config
directory.
Begin by pulling the latest Vault image version.
Vault configuration
Prometheus metrics are not enabled by default; setting the prometheus_retention_time
to a non-zero value enables them.
The example configuration includes a telemetry
stanza to set a 12 hour retention time for metrics stored in memory. It also specifies that Vault should not emit Prometheus metrics prefixed with host names, as this is not desirable in most use cases.
1 2 3 4 5 6 7 8 9 101112131415
More telemetry configuration details are available in the telemetry parameters documentation.
TLS Note
Although the listener stanza disables TLS for this tutorial, Vault should always be used with TLS enabled in production to provide secure communication between clients and the Vault server. To enable TLS requires a certificate file and key file on each Vault server.
Create the Vault server configuration.
Start Vault container
The Vault container specifies the IPC_LOCK capability for memory locking, a static IP address, and some volume mounts in the project directory for configuration and data.
Start the Vault container running detached in the background.
Check the Vault server logs to ensure that the container is ready.
When the Vault container is running, your output should resemble this example.
With the Vault container started and ready, proceed to preparing Vault for use.
Initialize, unseal & authenticate
The running Vault container publishes TCP port 8200 to the Docker host, so the Vault API address is http://127.0.0.1:8200
.
Export the VAULT_ADDR
environment variable value required for correctly addressing the Vault container.
For the purpose of simplicity in this tutorial, initialize Vault with 1 key share and a key threshold of 1 and write the output to the file .vault-init
in the project directory.
Successful execution of this command should produce no output.
Unseal Vault with the Unseal Key 1 value from the .vault-init
file.
Successful output from unsealing Vault should resemble this example:
If your status output also shows Vault to be initialized and unsealed, you can login with vault login
by passing the Initial Root Token value from the .vault-init
file.
This command should produce no output when successful. If you want to confirm that the login was successful, try a token lookup and confirm that your token policies contain root.
Note
You will use a root token in this scenario for simplicity. However, in actual production environments, root tokens should be closely guarded and used only for tightly controlled purposes. Review the documentation on root tokens for more details.
Successful output should contain the following.
Define Prometheus ACL Policy
The Vault /sys/metrics
endpoint is authenticated. Prometheus requires a Vault token with sufficient capabilities to successfully consume metrics from the endpoint.
Define a prometheus-metrics ACL policy that grants read capabilities to the metrics endpoint.
Create an example token with the prometheus-metrics policy attached that Prometheus will use for authentication to access the Vault telemetry metrics endpoint.
Write the token ID to the file prometheus-token
in the Prometheus configuration directory.
This command is expected to produce no output.
Note
Production Vault installations typically use auth methods to issue tokens, but for the sake of simplicity this scenario issues the token directly from the token store.
The Vault server is now prepared to properly expose telemetry metrics for Prometheus consumption, and you have created the token that Prometheus will use to access the metrics.
Prometheus container
Before you can start the Prometheus container, you must first create the configuration file prometheus.yml
.
The configuration is minimal, and specifies a scrape config job named vault with the Vault API endpoint as the metrics path, along with the path to the Vault token and the IP address plus port of the Vault server.
Pull the Prometheus image.
Start the Prometheus container using volume mounts that point to the previously created configuration and Vault token file.
Verify that Prometheus is ready to receive requests.
The log should contain an entry like this one.
Prometheus is ready; continue with Grafana container configuration and deployment.
Grafana container
Create a Grafana configuration that specifies the Prometheus container as the data source. This way, you can focus on metrics and dashboards instead of setting up the data source in the Grafana web UI.
Pull the latest Grafana image.
Start the Grafana container.
Verify that the Grafana container is ready.
The log should contain an entry like this one.
You can also optionally check once more to verify that all containers are up and running.
The output should resemble this example:
With all containers ready, move on to accessing and configuring Grafana through the web UI.
Access and configure Grafana dashboard
Access the Grafana web interface to create a dashboard containing some example Vault metrics.
Open http://localhost:3000/ in a browser.
Enter
admin
for both the Email or username and password fields.When prompted, change the admin password and confirm it.
Click the Dashboards icon in the navigation and select Manage
Click New Dashboard.
Click Add an empty panel to add the first new panel for a metric from Vault.
Add a memory utilization graph
Let's add a graph to the dashboard for Vault memory utilization.
In the New dashboard/Edit Panel page, use the following steps to add the graph.
In the Data source drop-down, choose vault.
In the Metrics browser text input, notice that you can begin to type
vault_
and Grafana will complete metric names for you from a listing. To specify the system memory usage of Vault, entervault_runtime_sys_bytes
here.Under Panel options in the navigation, enter
System memory utilization
for Title.Scroll down to Graph styles and select Bars.
Scroll to Standard options and use the drop-down to navigate to Data and select bytes(SI).
Click Apply
Your dashboard should resemble this example screenshot.
You can add more panels with the Add panel button shown in the screen shot.
Add a request handling graph
Add another panel to measure request handling.
In the Data source drop-down, choose vault.
In the Metrics browser text input, notice that you can begin to type
vault_
and Grafana will complete metric names for you from a listing. To specify the system memory usage of Vault, entervault_core_handle_request_count
here.Under Panel options in the navigation, enter
Requests handled count
for Title.Click Apply
Generate requests with token lookup
To generate work for the new request handling graph, go to your terminal session and perform 100 token lookup operations.
Return to the Grafana web UI and if necessary, click the refresh button.
Now observe your Grafana dashboard.
It should resemble this example screenshot, showing 100+ requests handled.
Feel free to experiment, and add more panels and metrics types to your dashboard.
Tip
A listing of popular metrics for monitoring Vault appears in the Telemetry Metrics Reference.
Cleanup
Stop and remove the Docker containers.
Remove the Docker network.
Remove the project directory.
Summary
You learned how to configure a Vault server to enable Prometheus metrics with a specific retention time and hostname setting. You also learned how to enable Prometheus metrics scraping and Grafana metrics visualization with dashboard panels.