Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A1E610F14 for ; Tue, 28 Jan 2014 18:54:16 +0000 (UTC) Received: (qmail 76453 invoked by uid 500); 28 Jan 2014 18:54:01 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 76316 invoked by uid 500); 28 Jan 2014 18:54:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 76306 invoked by uid 99); 28 Jan 2014 18:54:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jan 2014 18:54:01 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 98.139.253.105 is neither permitted nor denied by domain of daryn@yahoo-inc.com) Received: from [98.139.253.105] (HELO mrout2-b.corp.bf1.yahoo.com) (98.139.253.105) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jan 2014 18:53:57 +0000 Received: from GQ1-EX10-CAHT16.y.corp.yahoo.com (gq1-ex10-caht16.corp.gq1.yahoo.com [10.73.119.197]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id s0SIr7YP093408 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Tue, 28 Jan 2014 10:53:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1390935189; bh=7dxjhjAx+RvGCnXetL7UBEuqNNFsUHbd/L7dN4XIs9Y=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=lwjhOOv91Fhf9/cO3mGwOv6jiaTg6/XYo7L5kivQ/5zdJG7XqRkL2pv22Jp7dgTis 54Da8E5+/utuXUE+bfYH7EABS/Bm96kVvdsmi15C2ASxCBxE2JSUKYb4J43YiRi1H5 hMykBTIG5nITIL+UOR3VzmND/Hl0fIvefbhHezys= Received: from GQ1-MB01-02.y.corp.yahoo.com ([fe80::a049:b5af:9055:ada6]) by GQ1-EX10-CAHT16.y.corp.yahoo.com ([fe80::4b3:12cf:890b:b16e%14]) with mapi id 14.03.0174.001; Tue, 28 Jan 2014 10:53:07 -0800 From: Daryn Sharp To: "" Subject: Re: HDFS Federation address performance issue Thread-Topic: HDFS Federation address performance issue Thread-Index: AQHPHFOCUbF5JVgeWkCTwotH4Ms0GZqbAMYA Date: Tue, 28 Jan 2014 18:53:06 +0000 Message-ID: <7DFE9BF1-B2F9-4C35-9A07-941DAA05D100@yahoo-inc.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.74.91.193] Content-Type: multipart/alternative; boundary="_000_7DFE9BF1B2F94C359A07941DAA05D100yahooinccom_" MIME-Version: 1.0 X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 935188007 X-Virus-Checked: Checked by ClamAV on apache.org --_000_7DFE9BF1B2F94C359A07941DAA05D100yahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Anfernee, You will achieve improved performance with federation only if you stripe fi= les across the multiple NNs. Federation basically shares DN storage with m= ultiple NNs with the expectation the namespace load will be distributed acr= oss the multiple NNs. If everything writes to the exact same parent direct= ory then no benefit is achieved over a single NN. You will need to partiti= on your jobs so some write to one NN, other jobs write to the other NN(s). I hope this helps! Daryn On Jan 28, 2014, at 12:04 PM, Anfernee Xu > wrote: Hi, Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hd= fs/Federation.html#Key_Benefits, the overall performance can be improved by= federation, but I'm not sure federation address my usercase, could someone= elaborate it? My usercase is I have one single NM and several DN, and I have bunch of con= current MR jobs which will create new files(plan files and sub-directory) u= nder the same parent directory, the questions are: 1) Will these concurrent writes(new file, plan files and sub-directory unde= r the same parent directory) run in sequential because WRITE-once control g= ovened by single NM? I need this answer to estimate the necessity of moving to HDFS federation. Thanks -- --Anfernee --_000_7DFE9BF1B2F94C359A07941DAA05D100yahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-ID: Content-Transfer-Encoding: quoted-printable
Hi Anfernee,

You will achieve improved performance with federation only if you stri= pe files across the multiple NNs.  Federation basically shares DN stor= age with multiple NNs with the expectation the namespace load will be distr= ibuted across the multiple NNs.  If everything writes to the exact same parent directory then no benefit is achieved over= a single NN.  You will need to partition your jobs so some write to o= ne NN, other jobs write to the other NN(s).

I hope this helps!

Daryn

On Jan 28, 2014, at 12:04 PM, Anfernee Xu <anfernee.xu@gmail.com>
 wrote:

Hi,

Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federa= tion.html#Key_Benefits, the overall performance can be improved by fede= ration, but I'm not sure federation address my usercase, could someone elab= orate it?

My usercase is I have one single NM and several DN, and I have bunch of con= current MR jobs which will create new files(plan files and sub-directory) u= nder the same parent directory, the questions are:

1) Will these concurrent writes(new file, plan files and sub-directory unde= r the same parent directory) run in sequential because WRITE-once control g= ovened by single NM?

I need this answer to estimate the necessity of moving to HDFS federation.<= br>
Thanks

--
--Anfernee

--_000_7DFE9BF1B2F94C359A07941DAA05D100yahooinccom_--