kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-188) Support multiple data directories
Date Fri, 02 Nov 2012 00:52:13 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489184#comment-13489184
] 

Jun Rao commented on KAFKA-188:
-------------------------------

+1 on patch v8. Thanks,
                
> Support multiple data directories
> ---------------------------------
>
>                 Key: KAFKA-188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-188
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>         Attachments: KAFKA-188.patch, KAFKA-188-v2.patch, KAFKA-188-v3.patch, KAFKA-188-v4.patch,
KAFKA-188-v5.patch, KAFKA-188-v6.patch, KAFKA-188-v7.patch, KAFKA-188-v8.patch
>
>
> Currently we allow only a single data directory. This means that a multi-disk configuration
needs to be a RAID array or LVM volume or something like that to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0 striping. Common
wisdom in Hadoop land has it that a JBOD setup that just mounts each disk as a separate directory
and does application-level balancing over these results in about 30% write-improvement. For
example see this claim here:
>   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID controller should
be able to balance writes as well as the application so it may depend on the details of the
setup.
> Nonetheless this would be really easy to implement, all you need to do is add multiple
data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the others
it might unbalance the load across the disks. The partition->disk assignment policy should
probably attempt to evenly spread each topic to avoid this, rather than just trying keep the
number of partitions balanced between disks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message