Kafka Confluent K8s PART — I Single Cluster

Sridharan r.g
3 min readJun 11, 2024

Kafka is an open source, distributed publish-subscribe messaging system for handling high-volume, high-throughput, and real-time streaming data. You can use Kafka to build streaming data pipelines that move data reliably across different systems and applications for processing and analysis.

This platform administrators, cloud architects, and operations professionals interested in deploying Kafka clusters.

You can also use the CFK operator to deploy other components of the Confluent Platform, such as the web-based Confluent Control center, Schema Registry, or KsqlDB.

  • Plan and deploy GKE infrastructure for Apache Kafka
  • Deploy and configure the CFK operator
  • Configure Apache Kafka using the CFK operator to ensure availability, security, observability, and performance
  • Automated rolling updates for configuration changes.
  • Automated rolling upgrades with no impact to Kafka availability.
  • If a failure occurs, CFK restores a Kafka Pod with the same Kafka broker ID, configuration, and persistent storage volumes.
  • Automated rack awareness to spread replicas of a partition across different racks (or zones), improving availability of Kafka brokers and limiting the risk of data loss.
helm repo add confluentinc https://packages.confluent.io/helm

helm repo update

kubectl create ns kafka

helm install confluent-operator confluentinc/confluent-for-kubernetes -n kafka

helm ls -n kafka
  • Three replicas of Kafka brokers, with a minimum of two available replicas required for cluster consistency.
  • Three replicas of ZooKeeper nodes, forming a cluster.
  • Two Kafka listeners: one without authentication, and one utilizing TLS authentication with a certificate generated by CFK.
  • Tolerations, nodeAffinities, and podAntiAffinities configured for each workload, ensuring proper distribution across nodes, utilizing their respective node pools and different zones.
  • Communication inside the cluster secured by self-signed certificates using a Certificate Authority that you provide.

Generate CA Pair with Openssl.

kubectl create secret tls sslcerts --cert=ca.pem --key=ca-key.pem -n kafka

vi kafka-cluster.yaml

---
apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
name: kafka-cluster-confluent
spec:
replicas: 3
tls:
autoGeneratedCerts: true
image:
application: confluentinc/cp-server:7.4.0
init: confluentinc/confluent-init-container:2.6.0
dataVolumeCapacity: 50Gi
storageClass:
name: premium-rwo
configOverrides:
server:
- offsets.topic.replication.factor=3
- transaction.state.log.replication.factor=3
- transaction.state.log.min.isr=2
- default.replication.factor=3
- min.insync.replicas=2
- auto.create.topics.enable=true
listeners:
custom:
- name: tls
port: 9093
tls:
enabled: true
podTemplate:
tolerations:
- key: "app.stateful/component"
operator: "Equal"
value: "kafka-broker"
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "app.stateful/component"
operator: In
values:
- "kafka-broker"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: kafka-cluster-confluent
clusterId: kafka
platform.confluent.io/type: kafka
envVars:
- name: KAFKA_HEAP_OPTS
value: "-Xmx4G -Xms4G"
resources:
requests:
memory: 3Gi
cpu: "1"
limits:
memory: 4Gi
cpu: "2"
probe:
readiness:
failureThreshold: 15
dependencies:
zookeeper:
endpoint: zookeeper.kafka.svc.cluster.local:2182
tls:
enabled: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
name: zookeeper
spec:
replicas: 3
tls:
autoGeneratedCerts: true
image:
application: confluentinc/cp-zookeeper:7.4.0
init: confluentinc/confluent-init-container:2.6.0
dataVolumeCapacity: 50Gi
logVolumeCapacity: 10Gi
storageClass:
name: premium-rwo
podTemplate:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "app.stateful/component"
operator: In
values:
- "zookeeper"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: zookeeper
clusterId: kafka
platform.confluent.io/type: zookeeper
resources:
requests:
memory: 3Gi
cpu: "1"
limits:
memory: 3Gi
cpu: "2"

kubectl apply -f kafka-cluster.yaml -n kafka

kubectl get pod,svc,statefulset,deploy -n kafka

Check the pod and svc are up are running with given command. IN my next post will discuss about Kafka (topics, Consumer and Producer).

Thanks to GCP they provide good document to understands the concept.

--

--