Kafka - How is installed
1. Where is Kafka installed on the cloud?
Kafka itself is not “installed” in one fixed cloud location.
It runs as Kafka brokers (servers) on cloud infrastructure, and where it runs depends on the deployment model you choose.
Managed Kafka (Most common in cloud)
Here, you do NOT install Kafka. The cloud provider runs it for you.
AWS
- Amazon MSK (Managed Streaming for Apache Kafka)
- Kafka brokers run on AWS‑managed EC2 instances inside your VPC
- You only see bootstrap servers, not the underlying machines
[aws.amazon.com]
Azure
- Azure Event Hubs (Kafka‑compatible endpoint)
- No Kafka brokers visible
- Microsoft runs the service
- Your apps connect using Kafka protocol (9093)
[learn.microsoft.com]
Google Cloud
- Managed Service for Apache Kafka
- Kafka brokers run on Google‑managed infrastructure across zones [docs.cloud...google.com]
Confluent Cloud (multi‑cloud)
- Kafka runs on Confluent‑managed infrastructure
- Available on AWS, Azure, GCP
- Fully managed, no broker access
[confluent.io]
✅ In managed services, Kafka runs “inside the cloud provider’s managed environment”, not on your VMs.
2. Self‑Managed Kafka on Cloud (You install it)
Here you install Kafka yourself.
Common places:
- Virtual Machines
- AWS EC2
- Azure VM
- GCP Compute Engine
- Kubernetes
- EKS / AKS / GKE
- Kafka runs as pods + persistent volumes
- Docker / Docker Swarm
- Kafka runs as containers on cloud VMs
Kafka is installed:
- On VM disks (SSD/EBS/Premium disks)
- Uses cloud networking (VPC / VNet)
- You manage upgrades, scaling, failures
Simple mental model (interview‑ready)
Model Where Kafka runs Amazon MSK AWS‑managed EC2 (hidden from you) Azure Event Hubs Azure PaaS (no Kafka brokers exposed) Google Managed Kafka Google‑managed compute Confluent Cloud Vendor‑managed cloud infra Self‑managed Your VMs / Kubernetes nodes Step by Step -- Manual Installation process
Below is a clear, end‑to‑end, step‑by‑step procedure to install Apache Kafka on your own cloud VMs (AWS EC2 / Azure VM / GCP VM / on‑prem).
This is production‑grade, ZooKeeper‑less (KRaft mode), and aligns with Kafka 3.x+ / 4.x architecture.I’ll assume Linux VMs (Ubuntu/RHEL) and 3‑node cluster (recommended minimum).
Architecture You Are Building
Minimum production layout (recommended):VM Role Ports VM‑1 Controller + Broker 9092, 9093 VM‑2 Controller + Broker 9092, 9093 VM‑3 Controller + Broker 9092, 9093 (Separate controllers are better later; combined is fine to start.)VM & OS Prerequisites (ALL VMs)
Hardware (minimum)- CPU: 4 vCPU
- RAM: 8–16 GB
- Disk: 100+ GB SSD (separate data disk preferred)
- Network: Low latency, private IP connectivity
OS Setupsudo apt update && sudo apt upgrade -ysudo apt install -y openjdk-17-jdk wget curl net-toolsKafka requires Java 17+ for newer versions.java -versionCreate Kafka User (Security Best Practice)
sudo useradd kafka -msudo passwd kafkasudo usermod -aG sudo kafkasu - kafkaDownload & Install Kafka (ALL VMs)wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgztar -xvf kafka_2.13-3.7.0.tgzmv kafka_2.13-3.7.0 kafkaDirectory:home/kafka/kafkaConfigure Storage Directorysudo mkdir -p /data/kafkasudo chown -R kafka:kafka /data/kafkaGenerate a Cluster ID (ONLY ON ONE VM)cd ~/kafkabin/kafka-storage.sh random-uuidExample output:dPqzXuANQ5y9rG1mJvBm8Avi ~/kafka/config/kraft/server.propertiesConfigure Kafka (KRaft Mode)Example (VM‑1)################ KRaft ################process.roles=broker,controllernode.id=1################ Networking ################listeners=PLAINTEXT://10.0.0.1:9092,CONTROLLER://10.0.0.1:9093advertised.listeners=PLAINTEXT://10.0.0.1:9092listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXTcontroller.listener.names=CONTROLLER################ Controller Quorum ################controller.quorum.voters=1@10.0.0.1:9093,2@10.0.0.2:9093,3@10.0.0.3:9093################ Storage ################log.dirs=/data/kafka################ Replication ################num.partitions=3default.replication.factor=3min.insync.replicas=2offsets.topic.replication.factor=3transaction.state.log.replication.factor=3transaction.state.log.min.isr=2################ Retention ################log.retention.hours=168VM‑2 changesnode.id=2listeners=PLAINTEXT://10.0.0.2:9092,CONTROLLER://10.0.0.2:9093advertised.listeners=PLAINTEXT://10.0.0.2:9092VM‑3 changesnode.id=3listeners=PLAINTEXT://10.0.0.3:9092,CONTROLLER://10.0.0.3:9093advertised.listeners=PLAINTEXT://10.0.0.3:9092node.id must be unique ✅ IPs must be private VM IPsFormat Storage (ALL VMs – FIRST TIME ONLY)bin/kafka-storage.sh format \--cluster-id dPqzXuANQ5y9rG1mJvBm8A \--config config/kraft/server.propertiesNever run format again — it wipes metadata.Start Kafka (ALL VMs)
bin/kafka-server-start.sh config/kraft/server.propertiesnohup bin/kafka-server-start.sh config/kraft/server.properties > kafka.log 2>&1 &Run in background:Test Producer & ConsumerVerify Cluster
Check brokers
bin/kafka-broker-api-versions.sh --bootstrap-server 10.0.0.1:9092Check metadata quorumbin/kafka-metadata-quorum.sh \--bootstrap-server 10.0.0.1:9092 \describe --statusYou should see Leader + FollowersCreate Test Topicbin/kafka-topics.sh \--bootstrap-server 10.0.0.1:9092 \--create \--topic test-topic \--partitions 3 \--replication-factor 3bin/kafka-console-producer.sh \--bootstrap-server 10.0.0.1:9092 \--topic test-topicConsumer:bin/kafka-console-consumer.sh \--bootstrap-server 10.0.0.2:9092 \--topic test-topic \--from-beginning(Optional) Run Kafka as a Systemd ServiceCreate service file:sudo vi /etc/systemd/system/kafka.service[Unit]Description=Apache KafkaAfter=network.target[Service]User=kafkaExecStart=/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/kraft/server.propertiesExecStop=/home/kafka/kafka/bin/kafka-server-stop.shRestart=on-failureLimitNOFILE=100000[Install]WantedBy=multi-user.targetEnable:sudo systemctl daemon-reexecsudo systemctl enable kafkasudo systemctl start kafkaProduction Hardening (Next Steps)
- TLS + SASL authentication
- Separate controllers (3) & brokers (N)
- Disk IOPS monitoring
- Prometheus + Grafana
- Backup with MirrorMaker 2
- Rolling upgrades
TL;DR (Interview‑Ready Summary)
Kafka is installed on cloud VMs by installing Java, downloading Kafka binaries, configuring KRaft mode with unique node IDs and quorum voters, formatting storage once, and running Kafka as a system service across multiple VMs.
kafka/
├── bin/
├── config/
│ ├── server.properties
│ ├── consumer.properties
│ ├── producer.properties
│ └── log4j.properties



Comments
Post a Comment