Return-Path: X-Original-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 076D37302 for ; Sat, 5 Nov 2011 23:41:15 +0000 (UTC) Received: (qmail 30517 invoked by uid 500); 5 Nov 2011 23:41:14 -0000 Delivered-To: apmail-incubator-kafka-dev-archive@incubator.apache.org Received: (qmail 30493 invoked by uid 500); 5 Nov 2011 23:41:14 -0000 Mailing-List: contact kafka-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: kafka-dev@incubator.apache.org Delivered-To: mailing list kafka-dev@incubator.apache.org Received: (qmail 30484 invoked by uid 99); 5 Nov 2011 23:41:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Nov 2011 23:41:14 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Nov 2011 23:41:12 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 78FCF42C31 for ; Sat, 5 Nov 2011 23:40:51 +0000 (UTC) Date: Sat, 5 Nov 2011 23:40:51 +0000 (UTC) From: "Jay Kreps (Created) (JIRA)" To: kafka-dev@incubator.apache.org Message-ID: <999349895.3887.1320536451496.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (KAFKA-188) Support multiple data directories MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Support multiple data directories --------------------------------- Key: KAFKA-188 URL: https://issues.apache.org/jira/browse/KAFKA-188 Project: Kafka Issue Type: New Feature Reporter: Jay Kreps Currently we allow only a single data directory. This means that a multi-disk configuration needs to be a RAID array or LVM volume or something like that to be mounted as a single directory. For a high-throughput low-reliability configuration this would mean RAID0 striping. Common wisdom in Hadoop land has it that a JBOD setup that just mounts each disk as a separate directory and does application-level balancing over these results in about 30% write-improvement. For example see this claim here: http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html It is not clear to me why this would be the case--it seems the RAID controller should be able to balance writes as well as the application so it may depend on the details of the setup. Nonetheless this would be really easy to implement, all you need to do is add multiple data directories and balance partition creation over these disks. One problem this might cause is if a particular topic is much larger than the others it might unbalance the load across the disks. The partition->disk assignment policy should probably attempt to evenly spread each topic to avoid this, rather than just trying keep the number of partitions balanced between disks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira