Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC090EA33 for ; Mon, 11 Feb 2013 01:58:12 +0000 (UTC) Received: (qmail 37689 invoked by uid 500); 11 Feb 2013 01:58:08 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 37556 invoked by uid 500); 11 Feb 2013 01:58:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37548 invoked by uid 99); 11 Feb 2013 01:58:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 01:58:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.170] (HELO mail-wi0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 01:58:00 +0000 Received: by mail-wi0-f170.google.com with SMTP id hm11so2752414wib.3 for ; Sun, 10 Feb 2013 17:57:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=4+kvX2KFTAZoVcXdI5FzsfbwICMT9K3TONUwUVTm7hU=; b=AwntCt4SIQKBQHLiHOjiVGdstUh6buAI0o5xLnFwAd4Fc+D78c6iut9qbqrqlhL9xc yWXp6NmomltgrbnwdVYM/desOwxv7llsyGydDYn8XQ6BUXwyX2sQMoCdC9hilBTiJZY5 fRJv1d+u1bF7p4nMKUl7rfcOQhjV9CvRU6hqYGdrryWfd30N5S6lf1tYbIXi6KdifHe8 EywHS0X6DitXpd9ut7uvJYJnC3EE/es2OSicgEE4l4Tb8ynF4RkpEoKoCsMsl9J/DlEl skfq3zAUpMbtQCykAejyqCvtUmiOBzLmLtsovwg69Q6XXjsnrYYd4/m8Qf3F+gksqm/2 TviA== MIME-Version: 1.0 X-Received: by 10.180.97.197 with SMTP id ec5mr13032513wib.1.1360547859752; Sun, 10 Feb 2013 17:57:39 -0800 (PST) Received: by 10.194.8.65 with HTTP; Sun, 10 Feb 2013 17:57:39 -0800 (PST) Date: Sun, 10 Feb 2013 20:57:39 -0500 Message-ID: Subject: Mutiple dfs.data.dir vs RAID0 From: Jean-Marc Spaggiari To: user Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQm2KpG5ym/AEVXuCIiQtabJ1Wvi4sSp4yBINiOJ6PQfimfTc0IQew3imsI1Og391SpqGCsY X-Virus-Checked: Checked by ClamAV on apache.org Hi, I have a quick question regarding RAID0 performances vs multiple dfs.data.dir entries. Let's say I have 2 x 2TB drives. I can configure them as 2 separate drives mounted on 2 folders and assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives with RAID0 and assigned them as a single folder to dfs.data.dir. With RAID0, the reads and writes are going to be spread over the 2 disks. This is significantly increasing the speed. But if I put 2 entries in dfs.data.dir, hadoop is going to spread over those 2 directories too, and at the end, ths results should the same, no? Any experience/advice/results to share? Thanks, JM