Skip to content

Setup KPOps

In this part, you will set up KPOps. This includes:

  • optionally creating a local Kubernetes cluster
  • running Apache Kafka and Confluent's Schema Registry
  • installing KPOps

Prerequisites

Setup Kubernetes with k3d

If you don't have access to an existing Kubernetes cluster, this section will guide you through creating a local cluster. We recommend the lightweight Kubernetes distribution k3s for this. k3d is a wrapper around k3s in Docker that lets you get started fast.

  1. You can install k3d with its installation script:

    wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/v5.4.6/install.sh | bash
    

    For other ways of installing k3d, you can have a look at their installation guide.

  2. The Kafka deployment needs a modified Docker image. In that case the image is built and pushed to a Docker registry that holds it. If you do not have access to an existing Docker registry, you can use k3d's Docker registry:

    k3d registry create kpops-registry.localhost --port 12345
    
  3. Now you can create a new cluster called kpops that uses the previously created Docker registry:

    k3d cluster create kpops --k3s-arg "--no-deploy=traefik@server:*" --registry-use k3d-kpops-registry.localhost:12345
    

Note

Creating a new k3d cluster automatically configures kubectl to connect to the local cluster by modifying your ~/.kube/config. In case you manually set the KUBECONFIG variable or don't want k3d to modify your config, k3d offers many other options.

You can check the cluster status with kubectl get pods -n kube-system. If all returned elements have a STATUS of Running or Completed, then the cluster is up and running.

Deploy Kafka

Kafka is an open-source data streaming platform. More information about Kafka can be found in the documentation. To deploy Kafka, this guide uses Confluent's Helm chart.

  1. To allow connectivity to other systems Kafka Connect needs to be extended with drivers. You can install a JDBC driver for Kafka Connect by creating a new Docker image:

    1. Create a Dockerfile with the following content:

      1
      2
      3
      FROM confluentinc/cp-kafka-connect:7.1.3
      
      RUN confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.6.0
      
    2. Build and push the modified image to your private Docker registry:

      docker build . --tag localhost:12345/kafka-connect-jdbc:7.1.3 && \
      docker push localhost:12345/kafka-connect-jdbc:7.1.3
      

    Detailed instructions on building, tagging and pushing a docker image can be found in Docker docs.

  2. Add Confluent's Helm chart repository and update the index:

    helm repo add confluentinc https://confluentinc.github.io/cp-helm-charts/ &&  
    helm repo update
    
  3. Install Kafka, Zookeeper, Confluent's Schema Registry, Kafka Rest Proxy, and Kafka Connect. A single Helm chart installs all five components. Below you can find an example for the --values ./kafka.yaml file configuring the deployment accordingly. Deploy the services:

    1
    2
    3
    4
    5
    6
    7
    8
    helm upgrade \
        --install \
        --version 0.6.1 \
        --values ./kafka.yaml \
        --namespace kpops \
        --create-namespace \
        --wait \
        k8kafka confluentinc/cp-helm-charts
    
Kafka Helm chart values (kafka.yaml)

An example value configuration for Confluent's Helm chart. This configuration deploys a single Kafka Broker, a Schema Registry, Zookeeper, Kafka Rest Proxy, and Kafka Connect with minimal resources.

cp-zookeeper:
  enabled: true
  servers: 1
  imageTag: 7.1.3
  heapOptions: "-Xms124M -Xmx124M"
  overrideGroupId: k8kafka
  fullnameOverride: "k8kafka-cp-zookeeper"
  resources:
    requests:
      cpu: 50m
      memory: 0.2G
    limits:
      cpu: 250m
      memory: 0.2G
  prometheus:
    jmx:
      enabled: false

cp-kafka:
  enabled: true
  brokers: 1
  imageTag: 7.1.3
  podManagementPolicy: Parallel
  configurationOverrides:
    "auto.create.topics.enable": false
    "offsets.topic.replication.factor": 1
    "transaction.state.log.replication.factor": 1
    "transaction.state.log.min.isr": 1
    "confluent.metrics.reporter.topic.replicas": 1
  resources:
    requests:
      cpu: 50m
      memory: 0.5G
    limits:
      cpu: 250m
      memory: 0.5G
  prometheus:
    jmx:
      enabled: false
  persistence:
    enabled: false

cp-schema-registry:
  enabled: true
  imageTag: 7.1.3
  fullnameOverride: "k8kafka-cp-schema-registry"
  overrideGroupId: k8kafka
  kafka:
    bootstrapServers: "PLAINTEXT://k8kafka-cp-kafka-headless:9092"
  resources:
    requests:
      cpu: 50m
      memory: 0.25G
    limits:
      cpu: 250m
      memory: 0.25G
  prometheus:
    jmx:
      enabled: false

cp-kafka-connect:
  enabled: true
  replicaCount: 1
  image: k3d-kpops-registry.localhost:12345/kafka-connect-jdbc
  imageTag: 7.1.3
  fullnameOverride: "k8kafka-cp-kafka-connect"
  overrideGroupId: k8kafka
  kafka:
    bootstrapServers: "PLAINTEXT://k8kafka-cp-kafka-headless:9092"
  heapOptions: "-Xms256M -Xmx256M"
  resources:
    requests:
      cpu: 500m
      memory: 0.25G
    limits:
      cpu: 500m
      memory: 0.25G
  configurationOverrides:
    "consumer.max.poll.records": "10"
    "consumer.max.poll.interval.ms": "900000"
    "config.storage.replication.factor": "1"
    "offset.storage.replication.factor": "1"
    "status.storage.replication.factor": "1"
  cp-schema-registry:
    url: http://k8kafka-cp-schema-registry:8081
  prometheus:
    jmx:
      enabled: false

cp-kafka-rest:
  enabled: true
  imageTag: 7.1.3
  fullnameOverride: "k8kafka-cp-rest"
  heapOptions: "-Xms256M -Xmx256M"
  resources:
    requests:
      cpu: 50m
      memory: 0.25G
    limits:
      cpu: 250m
      memory: 0.5G
  prometheus:
    jmx:
      enabled: false

cp-ksql-server:
  enabled: false
cp-control-center:
  enabled: false

Deploy Streams Explorer

Streams Explorer allows examining Apache Kafka data pipelines in a Kubernetes cluster including the inspection of schemas and monitoring of metrics. First, add the Helm repository:

helm repo add streams-explorer https://bakdata.github.io/streams-explorer && \
helm repo update

Below you can find an example for the --values ./streams-explorer.yaml file configuring the deployment accordingly. Now, deploy the service:

1
2
3
4
5
6
helm upgrade \
    --install \
    --version 0.2.3 \
    --values ./streams-explorer.yaml \
    --namespace kpops \
    streams-explorer streams-explorer/streams-explorer
Streams Explorer Helm chart values (streams-explorer.yaml)

An example value configuration for Steams Explorer Helm chart.

imageTag: "v2.1.2"
config:
   K8S__deployment__cluster: true
   SCHEMAREGISTRY__url: http://k8kafka-cp-schema-registry.kpops.svc.cluster.local:8081
   KAFKACONNECT__url: http://k8kafka-cp-kafka-connect.kpops.svc.cluster.local:8083
resources:
   requests:
       cpu: 200m
       memory: 300Mi
   limits:
       cpu: 200m
       memory: 300Mi

Check the status of your deployments

Now we will check if all the pods are running in our namespace. You can list all pods in the namespace with this command:

kubectl --namespace kpops get pods

Then you should see the following output in your terminal:

1
2
3
4
5
6
7
NAME                                          READY   STATUS    RESTARTS   AGE
k8kafka-cp-kafka-connect-8fc7d544f-8pjnt      1/1     Running   0          15m
k8kafka-cp-zookeeper-0                        1/1     Running   0          15m
k8kafka-cp-kafka-0                            1/1     Running   0          15m
k8kafka-cp-schema-registry-588f8c65db-jdwbq   1/1     Running   0          15m
k8kafka-cp-rest-6bbfd7b645-nwkf8              1/1     Running   0          15m
streams-explorer-54db878c67-s8wbz             1/1     Running   0          15m

Pay attention to the STATUS row. The pods should have a status of Running.

Install KPOps

KPOps comes as a PyPI package. You can install it with pip:

pip install kpops