Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A5ADCDBB8 for ; Wed, 3 Oct 2012 23:56:08 +0000 (UTC) Received: (qmail 60210 invoked by uid 500); 3 Oct 2012 23:56:04 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 60114 invoked by uid 500); 3 Oct 2012 23:56:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60103 invoked by uid 99); 3 Oct 2012 23:56:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Oct 2012 23:56:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adi@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Oct 2012 23:55:56 +0000 Received: by ieak11 with SMTP id k11so22609705iea.35 for ; Wed, 03 Oct 2012 16:55:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=CDt7sTJX4FqQSWURSIuOjlh6GwwhJk3+psmR/vXqPQQ=; b=CloG7AZTsICRAxDikkvUEwZeY1ECsOSre7DyOsYpjeZLIMFDzhlnZ868psgzOXSnX1 c+g/ceNtaDPEmE41l3T59erwLFlgCCjtGlx3TRq2MZ6E82YBJFi3LkwBtoDBC55TMio1 +XNO1yyfXrB//XXUYE/I3mwlItC7INcLjHuelx0KD50Ov0Sk/EwcLWe1jlKGw8UhJMo4 NLgYUlh6pfKqxw+LLiQtTN2ddv1jDZJQpShT/vX2pr7xxAYa39luRnpf2ga7EQx/tpZH d1gqqZQVQPr1fxfd6P6c9DE1UemXRG2I6J84IhlviRSPvNksO2fVKBZOXHb6p1Cj2Bk2 4oKg== MIME-Version: 1.0 Received: by 10.50.40.133 with SMTP id x5mr13432929igk.32.1349308535564; Wed, 03 Oct 2012 16:55:35 -0700 (PDT) Received: by 10.64.126.68 with HTTP; Wed, 3 Oct 2012 16:55:35 -0700 (PDT) In-Reply-To: References: Date: Wed, 3 Oct 2012 16:55:35 -0700 Message-ID: Subject: Re: hadoop disk selection From: Andy Isaacson To: beatls@gmail.com, user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQk7lzUzfwEQRFstdoddXOXNxkmtXs5GOyJPLHW+Nwd6rk0aKaEtY6/ByxpxKnRCPzhpHv7t Moving this to user@ since it's not appropriate for general@. On Fri, Sep 28, 2012 at 11:16 PM, Xiang Hua wrote: > Hi, > i want to select 4(600G) local disks combined with 3*800G disks form > diskarray in one datanode. > is there any problem? performance ? The recommended configuration would be to partition and format each disk with ext4, then set dfs.datanode.data.dir to point to the mountpoints of each disk: dfs.datanode.data.dir /data/1/datadir,/data/2/datadir,/data/3/datadir You may also want to set dfs.datanode.du.reserved to 1GB or thereabouts. With this configuration your DN will fill all 7 datadir at the same rate pseudorandomly, until the 600G disks are nearly full, then it will write any further blocks to the 800G disks. Performance will be OK except that you will see performance hot-spots on the larger disks when writing past the 600GB mark. See https://issues.apache.org/jira/browse/HDFS-1564 for one missing feature in this area. I would not recommend using RAID-0 for datadir because if you experience a disk failure with independent filesystems, only the blocks on one datadir are lost and need to be rereplicated. If you experience a disk failure with RAID-0, all blocks stored on that DN are lost and need to be rereplicated. Also, RAID results in performance lockstep; a single slow disk will slow down access to all blocks on that DN, while with independent filesystems a single slow disk slows down only a fraction of the blocks on that DN. -andy