Kafka

1. Where is Kafka installed on the cloud?

Kafka itself is not “installed” in one fixed cloud location.
It runs as Kafka brokers (servers) on cloud infrastructure, and where it runs depends on the deployment model you choose.

Managed Kafka (Most common in cloud)

Here, you do NOT install Kafka. The cloud provider runs it for you.

AWS

Amazon MSK (Managed Streaming for Apache Kafka)
Kafka brokers run on AWS‑managed EC2 instances inside your VPC
You only see bootstrap servers, not the underlying machines
[aws.amazon.com]

Azure

Azure Event Hubs (Kafka‑compatible endpoint)
- No Kafka brokers visible
- Microsoft runs the service
- Your apps connect using Kafka protocol (9093)
  [learn.microsoft.com]

Google Cloud

Managed Service for Apache Kafka
Kafka brokers run on Google‑managed infrastructure across zones [docs.cloud...google.com]

Confluent Cloud (multi‑cloud)

Kafka runs on Confluent‑managed infrastructure
Available on AWS, Azure, GCP
Fully managed, no broker access
[confluent.io]

✅ In managed services, Kafka runs “inside the cloud provider’s managed environment”, not on your VMs.

2. Self‑Managed Kafka on Cloud (You install it)

Here you install Kafka yourself.

Common places:

Virtual Machines
- AWS EC2
- Azure VM
- GCP Compute Engine
Kubernetes
- EKS / AKS / GKE
- Kafka runs as pods + persistent volumes
Docker / Docker Swarm
- Kafka runs as containers on cloud VMs
Kafka is installed:
- On VM disks (SSD/EBS/Premium disks)
- Uses cloud networking (VPC / VNet)
- You manage upgrades, scaling, failures

Simple mental model (interview‑ready)

Model	Where Kafka runs
Amazon MSK	AWS‑managed EC2 (hidden from you)
Azure Event Hubs	Azure PaaS (no Kafka brokers exposed)
Google Managed Kafka	Google‑managed compute
Confluent Cloud	Vendor‑managed cloud infra
Self‑managed	Your VMs / Kubernetes nodes

Step by Step -- Manual Installation process

Below is a clear, end‑to‑end, step‑by‑step procedure to install Apache Kafka on your own cloud VMs (AWS EC2 / Azure VM / GCP VM / on‑prem).
This is production‑grade, ZooKeeper‑less (KRaft mode), and aligns with Kafka 3.x+ / 4.x architecture.

I’ll assume Linux VMs (Ubuntu/RHEL) and 3‑node cluster (recommended minimum).

Architecture You Are Building

Minimum production layout (recommended):

VM	Role	Ports
VM‑1	Controller + Broker	9092, 9093
VM‑2	Controller + Broker	9092, 9093
VM‑3	Controller + Broker	9092, 9093

(Separate controllers are better later; combined is fine to start.)

VM & OS Prerequisites (ALL VMs)

Hardware (minimum)

CPU: 4 vCPU
RAM: 8–16 GB
Disk: 100+ GB SSD (separate data disk preferred)
Network: Low latency, private IP connectivity

OS Setup

sudo apt update && sudo apt upgrade -y

sudo apt install -y openjdk-17-jdk wget curl net-tools

Kafka requires Java 17+ for newer versions.

java -version

Create Kafka User (Security Best Practice)

sudo useradd kafka -m
sudo passwd kafka
sudo usermod -aG sudo kafka
su - kafka

Download & Install Kafka (ALL VMs)

wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar -xvf kafka_2.13-3.7.0.tgz
mv kafka_2.13-3.7.0 kafka

Directory:
home/kafka/kafka

Configure Storage Directory

sudo mkdir -p /data/kafka
sudo chown -R kafka:kafka /data/kafka

Generate a Cluster ID (ONLY ON ONE VM)

cd ~/kafka

bin/kafka-storage.sh random-uuid

Example output:

dPqzXuANQ5y9rG1mJvBm8A

Configure Kafka (KRaft Mode)

vi ~/kafka/config/kraft/server.properties

Example (VM‑1)

################ KRaft ################

process.roles=broker,controller

node.id=1

################ Networking ################

listeners=PLAINTEXT://10.0.0.1:9092,CONTROLLER://10.0.0.1:9093

advertised.listeners=PLAINTEXT://10.0.0.1:9092

listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT

controller.listener.names=CONTROLLER

################ Controller Quorum ################

controller.quorum.voters=1@10.0.0.1:9093,2@10.0.0.2:9093,3@10.0.0.3:9093

################ Storage ################

log.dirs=/data/kafka

################ Replication ################

num.partitions=3

default.replication.factor=3

min.insync.replicas=2

offsets.topic.replication.factor=3

transaction.state.log.replication.factor=3

transaction.state.log.min.isr=2

################ Retention ################

log.retention.hours=168

VM‑2 changes

node.id=2

listeners=PLAINTEXT://10.0.0.2:9092,CONTROLLER://10.0.0.2:9093

advertised.listeners=PLAINTEXT://10.0.0.2:9092

VM‑3 changes

node.id=3

listeners=PLAINTEXT://10.0.0.3:9092,CONTROLLER://10.0.0.3:9093

advertised.listeners=PLAINTEXT://10.0.0.3:9092

node.id must be unique ✅ IPs must be private VM IPs

Format Storage (ALL VMs – FIRST TIME ONLY)

bin/kafka-storage.sh format \

--cluster-id dPqzXuANQ5y9rG1mJvBm8A \

--config config/kraft/server.properties

Never run format again — it wipes metadata.

Start Kafka (ALL VMs)
bin/kafka-server-start.sh config/kraft/server.properties

Run in background:

nohup bin/kafka-server-start.sh config/kraft/server.properties > kafka.log 2>&1 &

Verify Cluster

Check brokers
bin/kafka-broker-api-versions.sh --bootstrap-server 10.0.0.1:9092

Check metadata quorum

bin/kafka-metadata-quorum.sh \

--bootstrap-server 10.0.0.1:9092 \

describe --status

You should see Leader + Followers

Create Test Topic

bin/kafka-topics.sh \

--bootstrap-server 10.0.0.1:9092 \

--create \

--topic test-topic \

--partitions 3 \

--replication-factor 3

Test Producer & Consumer

bin/kafka-console-producer.sh \

--bootstrap-server 10.0.0.1:9092 \

--topic test-topic

Consumer:

bin/kafka-console-consumer.sh \

--bootstrap-server 10.0.0.2:9092 \

--topic test-topic \

--from-beginning

(Optional) Run Kafka as a Systemd Service

Create service file:

sudo vi /etc/systemd/system/kafka.service

[Unit]

Description=Apache Kafka

After=network.target

[Service]

User=kafka

ExecStart=/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/kraft/server.properties

ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh

Restart=on-failure

LimitNOFILE=100000

[Install]

WantedBy=multi-user.target

Enable:

sudo systemctl daemon-reexec

sudo systemctl enable kafka

sudo systemctl start kafka

Production Hardening (Next Steps)

TLS + SASL authentication
Separate controllers (3) & brokers (N)
Disk IOPS monitoring
Prometheus + Grafana
Backup with MirrorMaker 2
Rolling upgrades

TL;DR (Interview‑Ready Summary)

Kafka is installed on cloud VMs by installing Java, downloading Kafka binaries, configuring KRaft mode with unique node IDs and quorum voters, formatting storage once, and running Kafka as a system service across multiple VMs.

kafka/
├── bin/
├── config/
│ ├── server.properties
│ ├── consumer.properties
│ ├── producer.properties
│ └── log4j.properties

Search This Blog

Kafka - How is installed

1. Where is Kafka installed on the cloud?

Managed Kafka (Most common in cloud)

AWS

Azure

Google Cloud

Confluent Cloud (multi‑cloud)

2. Self‑Managed Kafka on Cloud (You install it)

Common places:

Create Kafka User (Security Best Practice)

Verify Cluster

Check brokers
bin/kafka-broker-api-versions.sh --bootstrap-server 10.0.0.1:9092

Production Hardening (Next Steps)

TL;DR (Interview‑Ready Summary)

Comments

Post a Comment

Kafka - How is installed

1. Where is Kafka installed on the cloud?

Managed Kafka (Most common in cloud)

AWS

Azure

Google Cloud

Confluent Cloud (multi‑cloud)

2. Self‑Managed Kafka on Cloud (You install it)

Common places:

Create Kafka User (Security Best Practice)

Verify Cluster

Check brokersbin/kafka-broker-api-versions.sh --bootstrap-server 10.0.0.1:9092

Production Hardening (Next Steps)

TL;DR (Interview‑Ready Summary)

Comments

Post a Comment

Check brokers
bin/kafka-broker-api-versions.sh --bootstrap-server 10.0.0.1:9092