Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0EC3F9EA3 for ; Wed, 12 Nov 2014 16:56:56 +0000 (UTC) Received: (qmail 5652 invoked by uid 500); 12 Nov 2014 16:56:51 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5515 invoked by uid 500); 12 Nov 2014 16:56:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5504 invoked by uid 99); 12 Nov 2014 16:56:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 16:56:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daemeonr@gmail.com designates 209.85.213.177 as permitted sender) Received: from [209.85.213.177] (HELO mail-ig0-f177.google.com) (209.85.213.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 16:56:25 +0000 Received: by mail-ig0-f177.google.com with SMTP id hl2so3284154igb.10 for ; Wed, 12 Nov 2014 08:55:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NKUK6YWrcOfS9oWQg6kELNXOvXGOirpGBB/tZxjYJ2w=; b=k73O0QjwBNQasFw1DDER3UBAY8MYiAE16cQCJ3J9zvqver76jr8yFHTiPpVjjWrU4c hjoS1tQ+78jgf/mSKgw2sGlngCVdn7RElZfHsWPkl3kOTSgUvXcRrD2V277W38pBzgUj YtZXArvFGt0mWH77FfEd/SHTCeWC3y0YvRk/1TE6sSd+EsKKI+o2sOrs1HxGOCGR2Sq0 r184WALYkkUmiErVlwVw+3tvp+OfGgODpBBamQ691T0S/9MjFoB+F/FazxmQx7rqmxUM S7K7BSUq91w91sMuakDtJDxS0pKYD09ojLiTEsjU78kfTeiBxjECsb9aAc/7AfWZkOkh jIkA== MIME-Version: 1.0 X-Received: by 10.107.6.197 with SMTP id f66mr3282143ioi.81.1415811339677; Wed, 12 Nov 2014 08:55:39 -0800 (PST) Received: by 10.50.35.162 with HTTP; Wed, 12 Nov 2014 08:55:39 -0800 (PST) Received: by 10.50.35.162 with HTTP; Wed, 12 Nov 2014 08:55:39 -0800 (PST) In-Reply-To: <54638C71.2050301@etinternational.com> References: <54638C71.2050301@etinternational.com> Date: Wed, 12 Nov 2014 08:55:39 -0800 Message-ID: Subject: Re: Datanode disk configuration From: daemeon reiydelle To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113f94b2f874f00507ac437d X-Virus-Checked: Checked by ClamAV on apache.org --001a113f94b2f874f00507ac437d Content-Type: text/plain; charset=UTF-8 I would consider a jbod with 16-64mb stride. This would be a choice where one or more (e.g. MR) steps will be io bound. Otherwise one or more tasks will be hit with the low read/write times of having large amounts of data behind a single spindle On Nov 12, 2014 8:37 AM, "Brian C. Huffman" wrote: > All, > > I'm setting up a 4-node Hadoop 2.5.1 cluster. Each node has the following > drives: > 1 - 500GB drive (OS disk) > 1 - 500GB drive > 1 - 2 TB drive > 1 - 3 TB drive. > > In past experience I've had lots of issues with non-uniform drive sizes > for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB > drives for this cluster. > > My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive > as intermediate data. Most our of jobs don't make large use of > intermediate data, but at least this way, I get a good amount of space > (2TB) per node before I run into issues. Then I may end up using the AvailableSpaceVolumeChoosingPolicy > to help with balancing the blocks. > > If necessary I could put intermediate data on one of the OS partitions > (/home). But this doesn't seem ideal. > > Anybody have any recommendations regarding the optimal use of storage in > this scenario? > > Thanks, > Brian > --001a113f94b2f874f00507ac437d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I would consider a jbod with 16-64mb stride. This would be a= choice where one or more (e.g. MR) steps will be io bound. Otherwise one o= r more tasks will be hit with the low read/write times of having large amou= nts of data behind a single spindle

On Nov 12, 2014 8:37 AM, "Brian C. Huffman&= quot; <bhuffman@etintern= ational.com> wrote:
=20 =20 =20
All,

I'm setting up a 4-node Hadoop 2.5.1 cluster.=C2=A0 Each node has t= he following drives:
1 - 500GB drive (OS disk)
1 - 500GB drive
1 - 2 TB drive
1 - 3 TB drive.

In past experience I've had lots of issues with non-uniform drive sizes for HDFS, but unfortunately it wasn't an option to get all 3T= B or 2TB drives for this cluster.=C2=A0

My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive as intermediate data.=C2=A0 Most our of jobs don't make large= use of intermediate data, but at least this way, I get a good amount of space (2TB) per node before I run into issues.=C2=A0 Then I may end up using the AvailableSpaceVolumeChoosingPolicy to help with balancing the blocks.

If necessary I could put intermediate data on one of the OS partitions (/home).=C2=A0 But this doesn't seem ideal.

Anybody have any recommendations regarding the optimal use of storage in this scenario?

Thanks,
Brian
--001a113f94b2f874f00507ac437d--