hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daryn Sharp <da...@yahoo-inc.com>
Subject Re: HDFS Federation address performance issue
Date Tue, 28 Jan 2014 18:53:06 GMT
Hi Anfernee,

You will achieve improved performance with federation only if you stripe files across the
multiple NNs.  Federation basically shares DN storage with multiple NNs with the expectation
the namespace load will be distributed across the multiple NNs.  If everything writes to the
exact same parent directory then no benefit is achieved over a single NN.  You will need to
partition your jobs so some write to one NN, other jobs write to the other NN(s).

I hope this helps!


On Jan 28, 2014, at 12:04 PM, Anfernee Xu <anfernee.xu@gmail.com<mailto:anfernee.xu@gmail.com>>


Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits,
the overall performance can be improved by federation, but I'm not sure federation address
my usercase, could someone elaborate it?

My usercase is I have one single NM and several DN, and I have bunch of concurrent MR jobs
which will create new files(plan files and sub-directory) under the same parent directory,
the questions are:

1) Will these concurrent writes(new file, plan files and sub-directory under the same parent
directory) run in sequential because WRITE-once control govened by single NM?

I need this answer to estimate the necessity of moving to HDFS federation.



View raw message