Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F44BEAA0 for ; Mon, 11 Feb 2013 02:23:08 +0000 (UTC) Received: (qmail 88535 invoked by uid 500); 11 Feb 2013 02:22:36 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 88255 invoked by uid 500); 11 Feb 2013 02:22:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 88248 invoked by uid 99); 11 Feb 2013 02:22:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 02:22:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cembree@gmail.com designates 209.85.219.53 as permitted sender) Received: from [209.85.219.53] (HELO mail-oa0-f53.google.com) (209.85.219.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 02:22:29 +0000 Received: by mail-oa0-f53.google.com with SMTP id m1so5803327oag.26 for ; Sun, 10 Feb 2013 18:22:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=qqj15LmGkxD6DJdFqk46+otgZMBq9Nt+5WO8hWvxoCg=; b=b1X3jhZkOO4aoT0MsoPUxWXkRkkEPnimPBIGIW9iHQfXKI9mB5C5RFwIythoFy4Eil XA7iRCNkV38iwfYUB15omDfD0xkns7w7bdaf82F0CjrkKklUjc8/8wiKeqOI/3AQgpM6 k3zCznqp7xxbF9IONZRDOGMpll6niAZGHU+MITcRpHspj/szZ8+0srvRd7pesILlPKBe cNoKYxmJzoTyWORvpItqEpR0M9wAEJg8/DS39smmd6znwV1jbUrTK6lTUfLsywpRXysx 3+qUBlVRiVmEfjXAH5Q5d8psFSSRKtbnNf6+aXaB5AA0Rard8OMpEaqueiZfkBmUi6Al ZZ7w== MIME-Version: 1.0 X-Received: by 10.60.22.198 with SMTP id g6mr9768375oef.45.1360549328702; Sun, 10 Feb 2013 18:22:08 -0800 (PST) Received: by 10.76.128.83 with HTTP; Sun, 10 Feb 2013 18:22:08 -0800 (PST) Reply-To: chris@embree.us In-Reply-To: References: Date: Sun, 10 Feb 2013 21:22:08 -0500 Message-ID: Subject: Re: Mutiple dfs.data.dir vs RAID0 From: Chris Embree To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8fb2016e6fcb2704d569939b X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb2016e6fcb2704d569939b Content-Type: text/plain; charset=ISO-8859-1 Interesting question. You'd probably need to benchmark to prove it out. I'm not the exact details of how HDFS stripes data, but it should compare pretty well to hardware RAID. Conceptually, HDFS should be able to out perform a RAID solution, since HDFS "knows" more about the data being written. One of the benefits of HDFS is being able to buy cheaper hardware and still getting good performance. We bought cheap DL165's for our datanodes. 4x 2TB Drives with no RAID. On Sun, Feb 10, 2013 at 8:57 PM, Jean-Marc Spaggiari < jean-marc@spaggiari.org> wrote: > Hi, > > I have a quick question regarding RAID0 performances vs multiple > dfs.data.dir entries. > > Let's say I have 2 x 2TB drives. > > I can configure them as 2 separate drives mounted on 2 folders and > assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives > with RAID0 and assigned them as a single folder to dfs.data.dir. > > With RAID0, the reads and writes are going to be spread over the 2 > disks. This is significantly increasing the speed. But if I put 2 > entries in dfs.data.dir, hadoop is going to spread over those 2 > directories too, and at the end, ths results should the same, no? > > Any experience/advice/results to share? > > Thanks, > > JM > --e89a8fb2016e6fcb2704d569939b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Interesting question. =A0You'd probably need to benchmark to prove it o= ut.

I'm not the exact details of how HDFS stripes da= ta, but it should compare pretty well to hardware RAID.

Conceptually, HDFS should be able to out perform a RAID solution, sinc= e HDFS "knows" more about the data being written. =A0One of the b= enefits of HDFS is being able to buy cheaper hardware and still getting goo= d performance.

We bought cheap DL165's for our datanodes. =A04x 2T= B Drives with no RAID.
--e89a8fb2016e6fcb2704d569939b--