kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <ao...@wikimedia.org>
Subject Kafka partitions unbalanced
Date Wed, 27 May 2015 20:11:34 GMT
Hi all,

I’ve recently noticed that our broker log.dirs are using up different amounts of storage.
 We use JBOD for our brokers, with 12 log.dirs, 1 on each disk.  One of our topics is larger
than the others, and has 12 partitions.  Replication factor is 3, and we have 4 brokers. 
Each broker then has to store 9 partitions for this topic (12*3/4 == 9).

I guess I had originally assumed that Kafka would be smart enough to spread partitions for
a given topic across each of the log.dirs as evenly as it could.  However, on some brokers
this one topic has 2 partitions in a single log.dir, meaning that the storage taken up on
a single disk by this topic on those brokers is twice what it should be.

e.g.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       1.8T  1.2T  622G  66% /var/spool/kafka/a
/dev/sdb3       1.8T  1.7T  134G  93% /var/spool/kafka/b
…
$ du -sh /var/spool/kafka/{a,b}/data/webrequest_upload-*
501G	a/data/webrequest_upload-4
500G	b/data/webrequest_upload-11
501G	b/data/webrequest_upload-8


This also means that those over populated disks have more writes to do.  My I/O is imbalanced!

This is sort of documented at http://kafka.apache.org/documentation.html <http://kafka.apache.org/documentation.html>:

"If you configure multiple data directories partitions will be assigned round-robin to data
directories. Each partition will be entirely in one of the data directories. If data is not
well balanced among partitions this can lead to load imbalance between disks.”

But my data is well balanced among partitions!  It’s just that multiple partitions are assigned
to a single disk.

Anyyyyyyway, on to a question:  Is it possible to move partitions between log.dirs?  Is there
tooling to do so?  Poking around in there, it looks like it might be as simple as shutting
down the broker, moving the partition directory, and then editing both replication-offset-checkpoint
and recovery-point-offset-checkpoint files so that they say the appropriate things in the
appropriate directories, and then restarting broker.

Someone tell me that this is a horrible idea. :)

-Ao



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message