Securing your logs in Confluent Cloud with HashiCorp Vault

28min
|
Vault

Challenge

Logging is an important part of managing service availability, security, and customer experience. It allows Site Reliability Engineers (SREs), developers, security teams, and infrastructure teams to gain insights to how their services are being consumed and address any issues before they result in service outages or security incidents. Often, logs contain sensitive information that needs to be protected.

Consider the scenario where the applications team and security operations team require access to the same set of logs, however, the teams must not be able to see specific fields in the log and the security requirement is that they must be masked or encrypted when presented back to the applications team. The ability to perform field-level encryption of the log data is difficult to achieve, it requires the ability to extract, transform, and load (ETL) the data before it is presented to the end user.

Now, you might be thinking, ETL? Do I need to build a data pipeline? What data formats do I need to use? What encryption libraries do I use? How do I protect the encryption keys? How do I scale the infrastructure to match increased demand in log ingestion and processing? Sounds complex, but it doesn't have to be. This tutorial walks you through how to build a secure data pipeline with Confluent Cloud and HashiCorp Vault.

Architecture

This section walks through an example architecture that can achieve the requirements covered earlier.

Exploring various log aggregation and data streaming services, Confluent Cloud, a cloud-native Apache Kafka® service, is used in this specific architecture because it allows for easy provisioning of fully managed Kafka, providing ease of access, storage, and management of data streams. It also provides many data integration options.

Architecture Image 1

The following covers the components used in this architecture and how they come together. Please note that configurations here are only for demonstration, and not to be used in a production environment.

Application

The application (app-a) is a simple JSON data generator that dumps logs to a specific volume. It is written in Python.

A Fluentd sidecar is configured to ingest the application logs and ship them to Confluent Cloud via a Fluentd Kafka plugin. The Fluentd plugin must have PKI certificates generated to be able to connect successfully to the Confluent Cloud platform; the generation of the certificates is taken care of by HashiCorp Vault.

Confluent Cloud

One of the use cases supported by Confluent is log analytics and Confluent Cloud is a core component of this architecture, it accelerates the deployment without having to worry about standing up a Kafka cluster. Confluent Cloud will be set up with two topics:

app-a-ingress: Kafka topic for ingesting and storing app-a logs.
app-a-egress-dev: Kafka topic for the storage of the encrypted logs. The topic name has -dev here to represent the topic for transformed logs for the developer team. A managed Confluent connector will be set up to push the encrypted log data to a logging system, Elasticsearch, which is used by the developer team.

Confluent Cloud supports many different types of connectors; this blog sets up two connector sinks, Elasticsearch, and AWS S3 sinks. Check out the Confluent Hub for a comprehensive list of sinks.

HashiCorp Vault Enterprise

HashiCorp Vault Enterprise is an identity-based secrets and encryption management system. A secret is anything that you want to tightly control access to, such as API encryption keys, passwords, or certificates. Vault provides encryption services that are gated by authentication and authorization methods.

For encryption, this tutorial utilizes various encryption methods of Vault Enterprise including transit, masking, and format preserving encryption (FPE). For detailed information on the encryption methods, have a look at the How to Choose a Data Protection Method blog.

Transformer

Transformer (app-a-transformer-dev) is a service responsible for encrypting the JSON log data, by calling to HashiCorp Vault APIs (using the hvac Python SDK). It is both a Kafka consumer and producer where encrypted JSON logs are written to another topic. The transformer is written in Python and utilizes the hvac Python Vault API client.

Elasticsearch/Kibana

ELK is widely used for analysis of logs and dashboards. Confluent Cloud will push the encrypted logs to Elasticsearch.

Prerequisites

Should have the following installed:

AWS CLI installed
Amazon EKSCTL CLI
Helm
Vault CLI
Kubernetes command-line interface (CLI)
HashiCorp Vault Enterprise: To test out all the encryption features covered in this blog, you need an Enterprise license key. You can sign up for a free trial. For more information on installing a Vault enterprise license see the Vault documentation here.
- Vault enterprise license key should be in a file named vault.hclic.
Confluent Cloud subscription: You can sign up for a free trial.
AWS account
- AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for a IAM User that can create and destroy EC2 instances, VPCs, and EKS clusters.

Clone example repository

Clone the learn-vault-secure-logs-confluent repo.

$ git clone https://github.com/hashicorp-education/learn-vault-secure-logs-confluent

Move into working directory.
```
$ cd learn-vault-secure-logs-confluent
```

Set up Confluent Cloud

Once logged in to Confluent Cloud, you need to set up the following.

After you log in, click Environments on the initial page.
Click +Add cloud environment.
Name the environment confl.
Choose a Stream Governance Package - for this tutorial you want the Essentials free tier package, and then choose Begin Configuration.
In the Enable Stream Governance Essentials screen, choose AWS as a cloud provider and a region that does not incur extra cost (ex. Ohio us-east-2), choose Enable.
Add a cluster into the environment through the Create Cluster button.
In the Create Cluster page choose the Basic type and then select Begin configuration.
Choose a cloud provider to deploy the cluster to, this tutorial uses AWS, Singapore (ap-southeast-1) with a single zone and choose Continue
When the Enter payment card info page opens, look to the bottom left choose Skip payment.
Choose Launch cluster

In a short while, you will have a cluster up and running.

Configure topics

To configure the topics, select your cluster and choose the Topics on the left nav as below.
Click on Create topic, update the name to app-a-egress-dev, and then click on Create with defaults to use the default settings.
The topic Overview will appear for app-a-egress-dev.
Click on the Topics link on the left navigation panel once more.
Click on Create topic and update the name to app-a-ingress and use the default setting.
Then click on Create with defaults use the default settings.
The topic Overview will appear.

API keys

To publish to or consume data from a topic, authentication is required. Confluent Cloud provides the ability to generate API keys with role-based access control (RBAC) permissions that control which topics can be consumed to or published to. This setup uses a Global Access API key. To set this up, go to Confluent Cloud management console:

Under Cluster Overview select API Keys option on the left navigation menu.
Select the Create key button.
Select Global access and choose Next button.
Download the API credentials. The API KEY, API SECRET, and BOOTSTRAP SERVER in this file will be used to configure Vault.

Bootstrap server details

You also need the bootstrap server details, this can be found in the cluster settings page.

Also under Cluster Overview choose Cluster settings on the left navigation and see the page open.
Copy the Bootstrap server field. Keep a record of this information because it will be used for the application and transformer deployment configurations.

AWS EKS cluster

Set up your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, replacing with the appropriate values below.

$ export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY> && export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>

Run the eksctl command shown below to create a VPC and a managed AWS EKS cluster. Since this is a temporary environment and to keep costs down, spot instances are used.
```
$ eksctl create cluster --name cluster-1  --region ap-southeast-1 \
  --nodegroup-name nodes --spot --instance-types=t3.medium --nodes 3 \
  --nodes-min 1 --nodes-max 3 --with-oidc --managed
```
Note
This step can take a while (20+ minutes). The following message will be displayed when the EKS cluster is ready: `2022-11-18 11:08:52 [✔] EKS cluster "cluster-1" in "ap-southeast-1" region is ready`.

Create a IAM service account. This will map an AWS IAM role to a Kubernetes service account. The AWS IAM role will use a policy that allows EBS CSI Driver access.

$ eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster cluster-1 \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EBS_CSI_DriverRole

The output should resemble this:

2023-05-16 08:05:46 [ℹ]  1 existing iamserviceaccount(s) (kube-system/aws-node) will be excluded
2023-05-16 08:05:46 [ℹ]  1 iamserviceaccount (kube-system/ebs-csi-controller-sa) was included (based on the include/exclude rules)
2023-05-16 08:05:46 [!]  serviceaccounts in Kubernetes will not be created or modified, since the option --role-only is used
2023-05-16 08:05:46 [ℹ]  1 task: { create IAM role for serviceaccount "kube-system/ebs-csi-controller-sa" }
2023-05-16 08:05:46 [ℹ]  building iamserviceaccount stack "eksctl-cluster-1-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa"
2023-05-16 08:05:46 [ℹ]  deploying stack "eksctl-cluster-1-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa"
2023-05-16 08:05:47 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa"
2023-05-16 08:06:18 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa"
2023-05-16 08:06:52 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa"

Retrieve and copy down your AWS Account number for use in the next step.
```
$ aws sts get-caller-identity --query "Account" --output text
```

Add the aws-ebs-csi-driver to the EKS cluster. Update the AWS_ACCOUNT_NUMBER with the account number for your AWS account.

$ eksctl create addon --name aws-ebs-csi-driver --cluster cluster-1 --service-account-role-arn arn:aws:iam::<AWS_ACCOUNT_NUMBER>:role/AmazonEKS_EBS_CSI_DriverRole --force

Output should resemble the following:

2023-05-16 08:11:20 [ℹ]  Kubernetes version "1.25" in use by cluster "cluster-1"
2023-05-16 08:11:21 [ℹ]  using provided ServiceAccountRoleARN "arn:aws:iam::166839932314:role/AmazonEKS_EBS_CSI_DriverRole"
2023-05-16 08:11:21 [ℹ]  creating addon

Vault server

Move your copy of an Vault enterprise license to the current directory. The file should be named vault.hclic.

Start with adding the HashiCorp repo to Helm.

$ helm repo add hashicorp https://helm.releases.hashicorp.com

Copy your file with the Vault Enterprise licence to the local directory.

Now you will copy the licence key to a Kubernetes secret.

$ secret=$(cat vault.hclic) && kubectl create secret generic vault-ent-license --from-literal="license=${secret}"

Install Vault on your cluster.
```
$ helm install hashicorp hashicorp/vault -f vault-config.yaml
```
This will deploy a Vault Enterprise instance in development mode with the root token set to root.

Verify that Vault is deployed and running:

$ kubectl get pods
NAME                                   READY   STATUS              RESTARTS   AGE
hashicorp-vault-0                                1/1     Running             0          9s
hashicorp-vault-agent-injector-985cd6494-ftpwf   1/1     Running             0          9s

Note

Problems here are likely due to issues with the enterprise license file. Check that the Kubernetes secret vault-ent-license was successfully created.

Configure Vault

There are a few things you need to configure on Vault, including the Transit and Transform secrets engine and Kubernetes authentication methods.

Now you will connect to the Vault container and confirm you can access it.

Open a new terminal window.

Expose Vault externally to the Kubernetes cluster using port-forwarding:

$ kubectl port-forward hashicorp-vault-0 8200:8200
Forwarding from 127.0.0.1:8200 -> 8200
Forwarding from [::1]:8200 -> 8200
...

Back in the original terminal window, set the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID and run these:
```
$ export VAULT_ADDR="http://localhost:8200" && export VAULT_TOKEN="root"
```

You can check the status of Vault:

$ vault status
Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    1
Threshold       1
Version         1.9.1+ent
Storage Type    inmem
Cluster Name    vault-cluster-a5b35278
Cluster ID      50bbe23a-1648-004f-a523-6fe8b8a9bb38
HA Enabled      false

You should be able to see the Vault UI by navigating in your browser to http://localhost:8200.

KV secrets engine

The application and transformers will require access to the Confluent Cloud API keys and the bootstrap server details you recorded in the API keys and bootstrap server details steps above.

As part of InfoSec best practices, avoid hardcoding credentials.

Mount the KV secrets engine.
```
$ vault secrets enable -version=2 kv
```

Store Confluent Cloud API keys for the application and transformer. Update the BOOTSTRAP_SERVER with the bootstrap server, API_key with the Confluent Cloud global API client ID, API_SECRET with the Confluent Cloud global API client secret before running the command.

$ vault kv put kv/confluent-cloud client_id=<API_KEY> \
  client_secret=<API_SECRET> \
  connection_string=<BOOTSTRAP_SERVER> \
  convergent_context_id="YWJjMTIz"

The results will resemble this:

===== Secret Path =====
kv/data/confluent-cloud
======= Metadata =======
Key                Value
---                -----
created_time       2022-12-13T17:17:38.293549086Z
custom_metadata    <nil>
deletion_time      n/a
destroyed          false
version            1

Store configurations for json values to be encrypted and encryption method to apply. These will be fetched by the transformer.

$ vault kv put kv/app-a/config - << EOF
{
"keys_of_interest":[
{"key": "owner.email", "method": "aes"},
{"key": "owner.NRIC", "method": "transform", "transformation":"sg-nric-mask"},
{"key": "owner.telephone", "method": "transform", "transformation":"sg-phone-fpe"},
{"key": "choices.places_of_interest", "method": "aes-converge"}
],
"transform_mount":"transform",
"transform_role_name":"sg-transform",
"transit_mount":"transit",
"transit_key_name":"transit",
"convergent_key_name":"transit-convergent"
}
EOF

PKI secrets engine

The PKI secrets engine needs to be set up to provide X.509 certificates for the application, specifically the Fluentd sidecar. The Kafka plugin requires the certificates to make the connection to Confluent Cloud.

Enable PKI secrets engine.
```
$ vault secrets enable pki
```

Configure the CA Certificate and private key

$ vault write pki/root/generate/internal \
    common_name=service.internal \
    ttl=8760h

Create a new PKI role.

$ vault write pki/roles/app \
    allowed_domains=service.internal \
    allow_subdomains=true \
    max_ttl=72h

Transit secrets engine

This section walks through the setup of the Vault Transit secrets engine. The requirements specify the need to encrypt the owner.email and choices.places_of_interest with the AES encryption method. Below are the Vault CLI commands to set up the secrets engine:

Enable the transit secrets engine.
```
$ vault secrets enable transit
```

Create a transit AES256 encryption key.

$ vault write -f transit/keys/transit type=aes256-gcm96

Create a convergent transit encryption key.
```
$ vault write -f transit/keys/transit-convergent \
    convergent_encryption=true derived=true type=aes256-gcm96
```
This will mount the Transit secrets engine and configure two AES-256 encryption keys and will be used by the transformer to encrypt the required fields in the logs.

Transform secrets engine

The Transform secrets engine is a Vault Enterprise feature that allows for more advanced encryption capabilities.

To configure the Transform secrets engine, first mount the Transform secrets engine:
```
$ vault secrets enable transform
```

NRIC transform configuration

Singaporean security requirements dictate that NRIC (National Registration Identity Card) details must be masked. This template configuration specifies the regex pattern for the NRIC, while the transformation configuration specifies the type of transform (masking or format preserving encryption) to be done.

Create a template for the NRIC pattern.

$ vault write transform/template/sg-nric \
    type=regex \
    pattern='[A-Z]{1}(\d{7})[A-Z]{1}' \
    alphabet=builtin/numeric

Create a transformation for NRIC.

$ vault write transform/transformation/sg-nric-mask \
    type=masking \
    masking_character='*' \
    template=sg-nric \
    tweak_source=internal \
    allowed_roles=sg-transform

Telephone transform configuration

Security requirements also dictate that phone numbers must be encrypted with format preserving encryption (FPE).

Create a template for the phone number pattern.

$ vault write transform/template/sg-phone \
    type=regex \
    pattern='[+](\d{2})-(\d{4})-(\d{4})' \
    alphabet=builtin/numeric

Create a transformation for the phone number.

$ vault write transform/transformation/sg-phone-fpe \
    type=fpe \
    template=sg-phone \
    tweak_source=internal \
    allowed_roles=sg-transform

A transform role is configured to allow access to the two transformations (sg-nric-mask and sg-phone-fpe) created earlier.
```
$ vault write transform/role/sg-transform \
    transformations=sg-nric-mask,sg-phone-fpe
```

Kubernetes auth method

Since the application and the transformer will be deployed on Kubernetes and require access to HashiCorp Vault, the Kubernetes authentication method is an effective way to enable this. To configure:

Set up an authentication service account on the Kubernetes cluster.

$ kubectl apply --filename kubernetes/vault-auth-service-account.yaml
serviceaccount/vault-auth created
clusterrolebinding.rbac.authorization.k8s.io/role-tokenreview-binding created

Create a secret used by Kubernetes authentication.

$ kubectl apply --filename kubernetes/vault-auth-secret.yaml
secret/vault-auth-secret created

Enable the Kubernetes auth method.

$ vault auth enable kubernetes
Success! Enabled kubernetes auth method at: kubernetes/

Need to get a few details from the Kubernetes cluster to complete the Vault configuration.

VAULT_HELM_SECRET_NAME=$(kubectl get secrets --output=json | jq -r '.items[].metadata | select(.name|startswith("vault-auth-")).name')
TOKEN_REVIEW_JWT=$(kubectl get secret $VAULT_HELM_SECRET_NAME --output='go-template={{ .data.token }}' | base64 --decode)
KUBE_CA_CERT=$(kubectl config view --raw --minify --flatten --output='jsonpath={.clusters[].cluster.certificate-authority-data}' | base64 --decode)
KUBE_HOST=$(kubectl get services --field-selector metadata.name=kubernetes -o jsonpath='{.items[].spec.clusterIP}')

Review the values.

$ echo $VAULT_HELM_SECRET_NAME && echo $TOKEN_REVIEW_JWT && echo $KUBE_CA_CERT && echo $KUBE_HOST

Blank lines indicate a problem, so output should resemble the following:

vault-auth-secret
eyJhbGciOiJSUzI1NiIsImtpZCI6Imw5MnpHMURxZG5mNDZJVlFvWjQ0M01ENHZPLW1hWk5Rd284OE11OW8tZFkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6InZhdWx0LWF1dGgtc2VjcmV0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InZhdWx0LWF1dGgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI1YzMwNDhlNC0xMjIzLTRmMjUtOGMyYi0zZGIzNDVmZWI2ZmIiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6ZGVmYXVsdDp2YXVsdC1hdXRoIn0.dJ1dgM7T7oUsu5bb0g5KwpZncItfaE48p_8bqlSSxqjin0nWsuj9KXjtp-RRw7iMJdtOKg3HYSoedr0daefo-3ohbM_ECdZ7IL7YrA4bbfSsk3X7pcaK0hMGOapWM0MYvI863GDWv0S7bU2zeeL1bO6cYpc0YwziJllOAQz52X3hOgXaS4PP_hYbCeZZ3pdPJwCsQBcXtsgjVNg5VdI4WJDSyqWPqiKpuNlLgtYD7ur-KODHZ7gViI83Iy7_0z2Y0be_VVJL_RuVJmU3sFmkagkYOrOm5CXp_gIKmEFDaCxbThJPJIAL6ESKuE-9gHcxwuYAt9SrlX64nz5N-idfZQ
-----BEGIN CERTIFICATE-----
MIIC/jCCAeagAwIBAgIBADANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwprdWJl
cm5ldGVzMB4XDTIzMDUxNjA3NTkyOFoXDTMzMDUxMzA3NTkyOFowFTETMBEGA1UE
AxMKa3ViZXJuZXRlczCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMDV
WiGSbRlCFtzYCRBeX3cssXJgLj6Aox9vKyy/cAfXxz0N0E0wHiMls6/T0alzKxT6
NDp/rMcE7zSwVLFljVhSLIJCGxjUHcaackQj6bFeXwoR7mmv7EshwDHYXcAXA2X3
n9AmEArQeXF5DpoD5xx0NSCoZ90eOjPeidHw6J3xR3aBMpdZsvj6fS62M/CWxEQq
9nd5NspYaFGrysczsC4zF1MNuj3S1LVWuEQMgP/rWFRV/L9qsMaiKlNTsEbkXC9p
E9ICpFNokqI/YwbLnhLyCp1jFsq3xFWoE65OE3ve/YtdPm1vV8rbU5vEFII6pG8+
3vZlQtpuzOJJ9TJ95okCAwEAAaNZMFcwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB
/wQFMAMBAf8wHQYDVR0OBBYEFGrdINWtfp/c0gsrezX1KCYHiHRCMBUGA1UdEQQO
MAyCCmt1YmVybmV0ZXMwDQYJKoZIhvcNAQELBQADggEBAG8X3ksv4pseskKESBEM
Y4wfAqr/M2pny4RCZlnfZZ8EHU75rDzl9PJzsqvgDKvRLrEdHKBJ5i4a3TEaVahc
vQyCEvVneV7OqzuotUA3qCXwO9+J7VGnfZIjTT18t2xjT3lt9O7MANS6sYtM3VbQ
B7mNEMYFnQfsSSTqMM1AtVNOoFz81yJsDUOHmD3D0e9R4N1KCt643EHPxPEVNytG
8aAWkZOITqxaHjEhh5Tlt8+KUDeevr53jef91S9+1jKdG8w+6eoGxWFW2iDJb0u7
kZImtvTGeZulYvQmuaSGU9uoL6CyJUtcH4cWY8GBuOqlS9PBbOdFrR2puB3dFpGK
e1E=
-----END CERTIFICATE-----
10.100.0.1

Configure the Kubernetes secrets engine.

$ vault write auth/kubernetes/config \
    kubernetes_host="https://$KUBE_HOST" \
    token_reviewer_jwt=$TOKEN_REVIEW_JWT \
    kubernetes_ca_cert="$KUBE_CA_CERT" \
    disable_iss_validation=true

Kubernetes auth method roles

These roles will be used by the application and transformers to authenticate to Vault.

Create the application role.

$ vault write auth/kubernetes/role/app \
    bound_service_account_names=app \
    bound_service_account_namespaces=default \
    policies=app-a-policy \
    ttl=24h

Create the transformer role.

$ vault write auth/kubernetes/role/transform \
    bound_service_account_names=transform \
    bound_service_account_namespaces=default \
    policies=transformer-policy \
    ttl=24h

Configure Vault policies

The application will require access to the secrets configured earlier in the KV secrets engine section. To allow this, Vault policies need to be configured:

$ vault policy write app-a-policy - <<EOF
path "kv/data/confluent-cloud" {
capabilities = ["read"]
}
path "pki/issue/app" {
capabilities = ["update"]
}
EOF

Transformer will require access to the transit and transform secrets engines for encryption.

$ vault policy write transformer-policy - <<EOF
path "/transit/encrypt/transit-convergent" {
  capabilities = ["update"]
}
path "transit/encrypt/transit" {
  capabilities = ["update"]
}
path "kv/data/confluent-cloud" {
  capabilities = ["read"]
}
path "kv/data/app-a/config" {
  capabilities = ["read"]
}
path "transform/encode/sg-transform" {
  capabilities = ["update"]
}
EOF

Transformer

The transformer will retrieve certain configurations stored in Vault as per the steps in the KV secrets engine, specifically in the kv/app-a/config and kv/confluent-cloud paths. Here is a run down of the configurations:

Configuration	parameters	description
client_id	string	Confluent Cloud global API client ID set up in API keys
client_secret	string	Confluent Cloud global API client secret set up in API keys
connection_string	string	Confluent Cloud Bootstrap server found in Bootstrap server details
keys_of_interest	key:	The JSON key path (in . notation)
-	method	Encryption method options to use: aes, aes-converge, transform (if using transform, the transformation name also needs to be specified)
-	transformation	Specifies the name of the transformation configuration (masking, FPE, tokenization); these transformations were created in steps NRIC transform configuration and Telephone transform configuration
transform_mount	string	Transform secrets engine path, configured in Transform , default is transform
transform_role_name	string	Transform role that has permissions to the transformations configured in NRIC transform configuration and Telephone transform configuration
transit_mount	string	Transit secrets engine path, configured in Transit secrets engine
transit_key_name	string	Name of Transit encryption key
convergent_key_name	string	Name of Transit encryption key set with derived as true. Convergent encryption requires a context which must be provided. Encryption operations yield the same ciphertext when using this key.
convergent_context_id	string(base64-encoded)	Context used for convergent encryption

To build and deploy the transformer, run this command (from learn-vault-secure-logs-confluent git repo directory):

$ kubectl apply -f deploy/transform-deploy.yml
deployment.apps/transform created
service/transform created
serviceaccount/transform created

The annotations in the deployment will configure a Vault Agent sidecar (listening on port 8200) and authenticate using the Kubernetes authentication method. Since agent-cache-enable and agent-cache-use-auto-auth-token are set to true, this will allow the Transformer to request secrets using the Vault Agent on http://localhost:8200 using the supplied token to the Vault Agent.

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: transform
spec:
selector:
  matchLabels:
    app: transform
template:
  metadata:
    labels:
      app: transform
    annotations:
      vault.hashicorp.com/agent-inject: "true"
      vault.hashicorp.com/role: "transform"
      vault.hashicorp.com/agent-cache-enable: "true"
      vault.hashicorp.com/agent-cache-use-auto-auth-token: "true"

  spec:
    serviceAccountName: transform
    containers:
    - name: transform
      env:
      - name: KAFKA_GROUP
        value: 'app-a-group'
      - name: INGRESS_TOPIC
        value: 'app-a-ingress'
      - name: EGRESS_TOPIC
        value: 'app-a-egress-dev'
      - name: SECRETS_PATH
        value: 'kv/data/confluent-cloud'
      - name: CONFIGS_PATH
        value: 'kv/data/app-a/config'
      - name: VAULT_ADDR
        value: 'http://localhost:8200'
      - name: VAULT_TOKEN
        value: ''
      - name: LOGLEVEL
        value: 'DEBUG'
      image: hashieducation/vault-confluentcloud-demo-transform:latest
      imagePullPolicy: Always
      resources:
        limits:
          memory: "128Mi"
          cpu: "500m"
      ports:
      - containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
name:  transform
spec:
selector:
  app:  transform
type:  ClusterIP
ports:
  - name:  tcp
    port:  8080
    targetPort:  8080
---
apiVersion: v1
  kind: ServiceAccount
metadata:
  name:  transform

Once the Transformer is deployed, it will subscribe to the Confluent Cloud app-a-ingress topic and monitor for incoming logs. Logs are processed and are then published to the app-a-engress-dev topic.

Elasticsearch and Kibana

The encrypted logs will be sent to Elasticsearch and viewed in Kibana. This section covers a setup with ECK (Elastic Cloud on Kubernetes) per quickstart instructions.

Some modifications were made to the deployment, including exposing Elasticsearch to the internet with a LoadBalancer.

To install, run the following:

Create the instance of Elastic Cloud.

$ kubectl create -f https://download.elastic.co/downloads/eck/1.9.1/crds.yaml
customresourcedefinition.apiextensions.k8s.io/agents.agent.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticmapsservers.maps.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created

Apply the operator.

$ kubectl apply -f https://download.elastic.co/downloads/eck/1.9.1/operator.yaml
namespace/elastic-system created
serviceaccount/elastic-operator created
secret/elastic-webhook-server-cert created
configmap/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
service/elastic-webhook-server created
statefulset.apps/elastic-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created

Deploy Elasticsearch and Kibana pods.

$ kubectl apply -f deploy/elk-deploy.yml
elasticsearch.elasticsearch.k8s.elastic.co/quickstart created
kibana.kibana.k8s.elastic.co/quickstart created

Once deployed and Elasticsearch is up and running, you need to capture a few configurations for the Confluent Cloud connector in the next section, such as the credentials for Elasticsearch. The default username is elastic, to get the password:
```
$ PASSWORD=$(kubectl get secret quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
```
Note down the password:
```
$ echo $PASSWORD
```

You also need to note down the load balancer details (EXTERNAL-IP):

$ kubectl get svc
NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP                                                                   PORT(S)             AGE
kubernetes                 ClusterIP      10.100.0.1       <none>               quickstart-es-default      ClusterIP      None             <none>                                                                        9200/TCP            13h
quickstart-es-http         LoadBalancer   10.100.134.61    a5db09d337eca490f82cf7a6ea17adf8-668057098.ap-southeast-1.elb.amazonaws.com   9200:31983/TCP      13h
quickstart-es-transport    ClusterIP      None             <none>                                                                        9300/TCP            13h
quickstart-kb-http         ClusterIP      10.100.157.8     <none>                                                                        5601/TCP            13h
transform                  ClusterIP      10.100.253.55    <none>                                                                        8080/TCP            50d
vault                      ClusterIP      10.100.255.143   <none>                                                                        8200/TCP,8201/TCP   9d
vault-agent-injector-svc   ClusterIP      10.100.49.129    <none>                                                                        443/TCP             9d
vault-internal             ClusterIP      None             <none>                                                                        8200/TCP,8201/TCP   9d

Confluent Cloud connectors

Confluent Cloud connectors provide fully managed connectivity to multiple data sources and sinks. In this case, you will set up two connectors:

Elasticsearch Service Sink connector
Amazon S3 Sink connector

Elasticsearch service sink connector

This connector will subscribe to the app-a-engress-dev topic (containing the encrypted JSON logs) and publish all messages to an instance of Elasticsearch, to be viewed in Kibana.

In the Confluent Cloud portal, select your cluster created in Set up Confluent Cloud steps. To set up the connector:

Select Connectors left navigation menu.
In the filters, search for Elasticsearch and select Elasticsearch Service Sink.
Choose the topic app-a-engress-dev and select Next.
On the Add Elasticsearch Service Sink connector 2. Kafka credentials choose Use an existing API key and put the API keys that you downloaded earlier.
On the 3. Authentication section, add the load balancer details you noted down earlier in the Connection URI field and append 9200 to the URI, the Connection user is elastic and the Connection password from the $PASSWORD you wrote down earlier.
In 4. Configuration the Input Kafka record value format is JSON.
Open Show advanced configurations.
Both Key ignore and Scheme ignore are true.
Data stream type and Data stream dataset are logs.
Everything else can be left with the default settings, and you can choose Continue.
In 5. Sizing, Tasks should be 1 then choose Continue.
For 6. Review and launch. the Connector name is ElasticsearchSink.
Review the settings below against the Connector configuration and if they match select Continue.

Setting	Value
topics	app-a-engress-dev
Kafka Cluster Authentication mode	KAFKA_API_KEY
Kafka API Key	Same key created in step API keys
Kafka API Secret	Same secret created in step API keys
Connection URI	<<loadbalancer_address>>:9200
Connection user	elastic
Connection password	elastic password retrieved in step Elasticsearch and Kibana
Enable SSL security	true
Input messages	JSON
Key ignore	true
Scheme ignore	true
Data Stream Type	logs
Data Stream Dataset	logs
Number of tasks for this connector	1
Name	ElasticsearchSink

If there are no errors with the configuration, after a few minutes of provisioning you should now have an operational connector:

Connector

Check connector status

On the page that appears make sure connector has a status of Running.

Application and Fluentd

The application deployment consists of two components:

The application (app-a) itself which is a JSON data generator using the Mimesis data generator. It appends the generated JSON records to /fluentd/log/user.log.
The Fluentd sidecar has the fluent-plugin-kafka installed. It will track changes in the /fluentd/log/user.log and upload the JSON records to the app-a-ingress topic in Confluent Cloud.

The Fluentd sidecar requires a few configurations to work, including a few secrets:

X.509 certificates for the fluent-plugin-kafka, the certificates are required by the plugin to connect to the Confluent Cloud cluster broker.
Confluent Cloud API credentials for the fluent-plugin-kafka plugin to authenticate as a producer and push the logs to the app-a-ingress topic.

These secrets will be provided by Vault, and these configurations will be passed as part of the deployment file.

The deployment file is below and makes use of Vault Agent Sidecar Annotations to retrieve the required secrets and render the Fluentd configuration file.

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
selector:
  matchLabels:
    app: app
template:
  metadata:
    labels:
      app: app
    annotations:
      vault.hashicorp.com/agent-inject: "true"
      vault.hashicorp.com/role: "app"
      vault.hashicorp.com/agent-cache-enable: "true"
      vault.hashicorp.com/agent-cache-use-auto-auth-token: "true"
      vault.hashicorp.com/agent-inject-secret-ca.pem: ""
      vault.hashicorp.com/secret-volume-path-ca.pem: "/fluentd/cert"
      vault.hashicorp.com/agent-inject-template-ca.pem: |
        {{- with secret "pki/issue/app" "common_name=app-a.service.internal" -}}
        {{ .Data.issuing_ca }}
        {{- end }}
      vault.hashicorp.com/agent-inject-secret-key.pem: ""
      vault.hashicorp.com/secret-volume-path-key.pem: "/fluentd/cert"
      vault.hashicorp.com/agent-inject-template-key.pem: |
        {{- with secret "pki/issue/app" "common_name=app-a.service.internal" -}}
        {{ .Data.private_key }}
        {{- end }}
      vault.hashicorp.com/agent-inject-secret-cert.pem: ""
      vault.hashicorp.com/secret-volume-path-cert.pem: "/fluentd/cert"
      vault.hashicorp.com/agent-inject-template-cert.pem: |
        {{- with secret "pki/issue/app" "common_name=app-a.service.internal" -}}
        {{ .Data.certificate }}
        {{- end }}
      vault.hashicorp.com/agent-inject-secret-fluent.conf: ""
      vault.hashicorp.com/secret-volume-path-fluent.conf: "/fluentd/etc"
      vault.hashicorp.com/agent-inject-template-fluent.conf: |
        <system>
          log_level debug
        </system>
        # TCP input
        <source>
          @type forward
          port 24224
        </source>
        <source>
          @type tail
          path /fluentd/log/user.log
          pos_file /fluentd/log/user.pos
          @log_level debug
          tag user.log
          <parse>
            @type json
          </parse>
        </source>
        <match user.log>
            @type kafka2
            # list of seed brokers
            brokers {{- with secret "kv/data/confluent-cloud" }} {{ .Data.data.connection_string }}{{- end }}
            use_event_time true
            # buffer settings
            <buffer ingress>
              @type file
              path /fluentd/td/log
              flush_interval 1s
            </buffer>
            # data type settings
            <format>
              @type json
            </format>
            # topic settings
            topic_key app-a-ingress
            default_topic app-a-ingress
            # producer settings
            required_acks -1
            compression_codec gzip
            ssl_ca_cert '/fluentd/cert/ca.pem'
            ssl_client_cert '/fluentd/cert/cert.pem'
            ssl_client_cert_key '/fluentd/cert/key.pem'
            sasl_over_ssl true
            ssl_ca_certs_from_system true
            username {{- with secret "kv/data/confluent-cloud" }} {{ .Data.data.client_id }}{{- end }}
            password {{- with secret "kv/data/confluent-cloud" }} {{ .Data.data.client_secret }}{{- end }}
        </match>

  spec:
    serviceAccountName: app
    containers:
    - name: app
      env:
      - name: NUM_OF_RUNS
        value: '10'
      - name: PATH_TO_LOG
        value: '/fluentd/log/user.log'
      image: hashieducation/vault-confluentcloud-demo-app:latest
      imagePullPolicy: Always
      resources:
        limits:
          memory: "128Mi"
          cpu: "500m"
      ports:
      - containerPort: 8080
      volumeMounts:
      - name:  app-log
        mountPath:  /fluentd/log
    - name: fluentd
      image: hashieducation/vault-confluentcloud-demo-fluentd:latest
      imagePullPolicy: Always
      resources:
        limits:
          memory: "128Mi"
          cpu: "500m"
      ports:
      - containerPort: 24224
      volumeMounts:
      - name:  app-log
        mountPath:  /fluentd/log
    volumes:
    - name: app-log
      emptyDir: {}
---
kind: Service
apiVersion: v1
metadata:
name:  app
spec:
selector:
  app:  app
type:  ClusterIP
ports:
- name:  tcp
  port:  8080
  targetPort:  8080
---
apiVersion: v1
kind: ServiceAccount
metadata:
name:  app

Deploy the application:
```
$ kubectl apply -f ./deploy/app-deploy.yml
```
Once the application is deployed, it will begin to generate fake JSON data and append to the /fluentd/log/user.json file.

View logs in Confluent Cloud

It is possible to see the messages being published in the Confluent Cloud topic.

To view them from the Confluent Cloud portal, you will select the topic name you wish to view as shown below.
In the app-a-ingress topic, choose the Messages tab. You should see a live stream of JSON logs being pushed by app-a Fluentd sidecar. Below is an example:
Click on a message and look at the details.
In the app-a-egress-dev topic you should see a live stream of encrypted JSON logs being pushed by the Transformer. Below is an example:
Click on a message and look at the details.

The highlighted fields were encrypted successfully.

The owner.telephone field was put through a format preserving encryption transform and the owner.NRIC field was masked.

The owner.email and choices.places_of_interest fields were encrypted with Vault Transit secrets engine. The secrets engine appends the ciphertext with vault:v1 indicating that it was encrypted by Vault, using version 1 of the encryption key. This is important as Vault Transit secrets engine can also perform key rotation; tracking which version of the key was used to encrypt is necessary to be able to decrypt the data.

Architecture considerations

Below are some important considerations related to this architecture:

The Vault configuration is in development mode and should not be used in production; TLS was not enabled on the Vault API. TLS listener should be configured in Vault.
The Transformer optimizes encryption requests to HashiCorp Vault in batches using batch_input, which improves the encryption performance significantly.
HashiCorp Vault Enterprise can be horizontally scaled by adding more nodes, allowing for scaling of encryption/decryption operations.
Confluent Cloud API keys should be configured to provide least privilege access to resources such as topics. Please see Confluent Cloud API best practices for more details.
Confluent Cloud has a number of networking options including different private networking options.

Clean up

Delete the cluster.

$ eksctl delete cluster --name cluster-1 --region=ap-southeast-1
2022-11-17 14:15:59 [ℹ]  deleting EKS cluster "cluster-1"
2022-11-17 14:16:01 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "cluster-1"
2022-11-17 14:16:01 [ℹ]  starting parallel draining, max in-flight of 1
2022-11-17 14:16:02 [ℹ]  deleted 0 Fargate profile(s)
2022-11-17 14:16:04 [✔]  kubeconfig has been updated
2022-11-17 14:16:04 [ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
2022-11-17 14:16:08 [ℹ]  2 sequential tasks: { delete nodegroup "ng-33a2dd27", delete cluster control plane "cluster-1" [async] }
2022-11-17 14:16:08 [ℹ]  will delete stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:16:08 [ℹ]  waiting for stack "eksctl-cluster-1-nodegroup-ng-33a2dd27" to get deleted
2022-11-17 14:16:08 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:16:39 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:17:27 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:17:59 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:18:31 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:20:11 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:21:50 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:23:33 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:24:37 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:25:58 [ℹ]  waiting for CloudFormation stack "eksctl-cluster-1-nodegroup-ng-33a2dd27"
2022-11-17 14:25:59 [ℹ]  will delete stack "eksctl-cluster-1-cluster"
2022-11-17 14:26:00 [✔]  all cluster resources were deleted

Unset all the environment variables.

$ unset AWS_ACCESS_KEY \
AWS_REGION \
AWS_SECRET_ACCESS_KEY \
VAULT_ADDR \
VAULT_TOKEN \
VAULT_HELM_SECRET_NAME \
TOKEN_REVIEW_JWT \
KUBE_CA_CERT \
KUBE_HOST

Go into your AWS Account and double check the CloudFormation templates with the name of "cluster-1". To verify that they deleted successfully, there will be no CloudFormation stacks present.
If there were issues with the CloudFormation templates deletion you can manually delete the Load Balancer, InternetGateway and VPC associated with "cluster-1".
If you had to manually delete anything return to CloudFormation and rerun the delete stacks. After a few minutes the stacks should delete themselves.

Help and reference

HashiCorp Vault Enterprise and Confluent Cloud can work together to address various data protection requirements. This use case is not limited to just logs, but any data that is managed within Kafka/Confluent Cloud. Vault Enterprise can be deployed across any cloud and on premises, allowing it to stay near your data, minimizing latency and improving performance.

To learn more about Confluent Cloud and HashiCorp Vault, here are a few useful resources:

Monitor telemetry with Prometheus & Grafana

Configure Vault as CM in Kubernetes with Helm