kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-188) Support multiple data directories
Date Fri, 02 Nov 2012 19:03:16 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jay Kreps updated KAFKA-188:

    Fix Version/s: 0.8
> Support multiple data directories
> ---------------------------------
>                 Key: KAFKA-188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-188
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>         Attachments: KAFKA-188.patch, KAFKA-188-v2.patch, KAFKA-188-v3.patch, KAFKA-188-v4.patch,
KAFKA-188-v5.patch, KAFKA-188-v6.patch, KAFKA-188-v7.patch, KAFKA-188-v8.patch
> Currently we allow only a single data directory. This means that a multi-disk configuration
needs to be a RAID array or LVM volume or something like that to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0 striping. Common
wisdom in Hadoop land has it that a JBOD setup that just mounts each disk as a separate directory
and does application-level balancing over these results in about 30% write-improvement. For
example see this claim here:
>   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID controller should
be able to balance writes as well as the application so it may depend on the details of the
> Nonetheless this would be really easy to implement, all you need to do is add multiple
data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the others
it might unbalance the load across the disks. The partition->disk assignment policy should
probably attempt to evenly spread each topic to avoid this, rather than just trying keep the
number of partitions balanced between disks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message