Kafka in Kubernetes with AxonOps and Strimzi

Introduction
I often find it difficult to recommend running data-heavy applications, like databases, in Kubernetes. While Kubernetes is excellent for microservices, it isn't always the optimal choice for data-centric workloads. However, Apache Kafka is a notable exception.
Our customers often ask how they can migrate their Kafka workloads into Kubernetes to take advantage of cluster resilience and bring data closer to the application layer. Our recommended approach is to leverage Strimzi in combination with AxonOps for seamless management and monitoring.
What is Strimzi?
Strimzi is the de facto Kubernetes Operator for Apache Kafka. It automates lifecycle operations—provisioning, upgrades, scaling, configuration drift management—via custom resources such as Kafka, KafkaNodePool, and KafkaConnect. Modern Strimzi clusters run Kafka in KRaft mode (no ZooKeeper), and the operator translates declarative specs into stateful pods, storage, listener configuration, and security.
Key Strimzi benefits include Kubernetes-native operations, opinionated best practices for availability, standardized CRDs for platform teams, and strong support for day-2 operations like rolling reconfigurations and version upgrades.
What is AxonOps?
AxonOps is an operations platform for distributed data systems that consolidates metrics, logs, events, and topology into a single control plane. For Kafka, AxonOps provides deep visibility across brokers, controllers, topics, partitions, network, and storage, with curated dashboards, golden signals, and policy-driven alerting. An AxonOps agent runs alongside your workloads and streams telemetry to the AxonOps back end (server, time-series database for metrics, and search back end for logs/events).

Why AxonOps for Kafka? You get unified observability tailored for Kafka’s internals, faster triage with correlation between metrics and logs, environment-wide fleet views, and consistent operations patterns across dev, staging, and production.
Why AxonOps + Strimzi is a great fit
Strimzi gives you reliable, automated, Kubernetes-native Kafka operations. AxonOps adds domain-specific intelligence and observability that understands Kafka and its operating patterns. Together, you declare your desired Kafka state with Strimzi CRDs, use AxonOps to see health, performance, and anomalies in real time, and close the loop with informed changes rolled safely by the operator.
Operator-driven Kafka + purpose-built observability platform = faster diagnoses, safer changes, and fewer surprises.

Integration overview
Integration is straightforward: you deploy Strimzi, apply AxonOps-aware container images or environment configuration to the Kafka node pools, and provide AxonOps agent credentials via a Kubernetes Secret. The AxonOps examples repository publishes ready-to-use manifests that embed the AxonOps agent into Strimzi-compatible images and wire up logging, node pools, and the cluster resource.
Reference examples
Use the official examples for cloud environments: https://github.com/axonops/axonops-containers/tree/development/examples/strimzi/cloud
These manifests provision a KRaft-based Kafka cluster with production-ready defaults (replication factor 3), topology spread constraints, and AxonOps agent configuration via environment variables. You can customize namespace, storage class, replica counts, and images through a single env file, then apply with envsubst + kubectl. The approach works across common managed Kubernetes offerings.
Highlights from the cloud examples:
KRaft mode with separate controller and broker node pools for resilienceAxonOps-aware images with the agent embedded, plus a Secret for keys and org detailsTopology spread constraints aligned to zone/region labels to avoid single-AZ concentrationOptional Kafka Connect deployment pattern (supported by AxonOps)
Step-by-step: integrating AxonOps with Strimzi
1) Prepare AxonOps backend
Deploy the AxonOps server, dashboard, time-series database (for metrics), and search back end (for logs/events). If you use cert-manager, you can enable TLS for secure agent communications. Ensure the AxonOps “agent” service is reachable from the Kafka pods (typically via a Kubernetes Service name).
2) Install Strimzi
Install the Strimzi operator in its own namespace and configure it to watch your Kafka namespace(s). This gives you the CRDs and controllers needed to reconcile Kafka resources.
1 helm repo add strimzi https://strimzi.io/charts
2 helm upgrade --install strimzi-kafka-operator \
3 strimzi/strimzi-kafka-operator -n strimzi \
4 --create-namespace \
5 --set watchAnyNamespace=true3) Configure AxonOps agent credentials
Create a Kubernetes Secret containing the AxonOps agent key, organization, TLS mode, and server endpoint. The provided example file axonops-config-secret.yaml uses envsubst to inject your values from strimzi-config.env so you can track configuration cleanly.
4) Define node pools and logging
Apply the KafkaNodePool resources for controllers and brokers. These reference AxonOps-aware images and set environment variables that point the embedded agent to your AxonOps server. The kafka-logging-cm.yaml configures a rolling file appender for server logs so AxonOps can ingest and correlate log data with metrics.
5) Create the Kafka cluster
Finally, apply the Kafka resource in KRaft mode. It references the node pools you created and sets listeners, replication factors, and any JVM or logging tweaks. Because the agent is already baked into the images and environment, the pods connect to AxonOps automatically on startup.
Order matters: create node pools first, then the Kafka resource. Also validate your storage class and topologyKey to match your cloud’s node labels.
Operational advantages you gain
- Faster triage: Prebuilt Kafka dashboards and curated alerts reduce MTTR when partitions go under-replicated, ISR shrinks, or request latencies spike.
- Topology-aware insights: See cross-AZ placement and understand how scheduling or storage constraints affect performance.
- Fleet-wide visibility: Manage multiple clusters consistently with organization- and cluster-level grouping.
- Change confidence: Use AxonOps trends and correlation to validate Strimzi-driven upgrades, config changes, and autoscaling outcomes.
Tips for smooth rollouts
- Align topologyKey with your cloud: typically topology.kubernetes.io/zone and topology.kubernetes.io/region.
- Right-size PVCs and storage classes: ensure IOPS/throughput meet your target workload profile.
- Validate agent egress: confirm Kafka pods can resolve and reach the AxonOps agent service DNS/port.
Final Thoughts
Strimzi is a proven, Kubernetes-native way to run Kafka—simplifying cluster lifecycle, upgrades, and scaling with reliable, declarative operations. We’ve been running it internally and for our customers for years, and we’ve built up deep knowledge of how it behaves in the real world. Paired with AxonOps’ deep observability and faster troubleshooting, you get confident day-2 operations and quicker incident resolution instead of “stare at dashboards and guess.”
Together, this is the stack we actually bet on for production Kafka on Kubernetes: simple to reason about, automatable end to end, and battle-tested enough that we’re happy to put our name behind it.
I, for one, welcome our new robot overlords




.png)