hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Lammerts <Evert.Lamme...@sara.nl>
Subject RE: block placement
Date Fri, 01 Jul 2011 08:29:16 GMT
Well, here's my first Hadoop Jira :-)


From: Harsh J [harsh@cloudera.com]
Sent: Thursday, June 30, 2011 4:59 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: block placement


With the default behavior, every block request is handed a storage
device in a round robin fashion. So yes, parallel writes should be
well spread over the configured amount of disks. But that's not to say
that the file is perfectly distributed across the DNs as you describe.
The DNs are chosen randomly for writes (if no local one is available).

Regd. MR, I do not believe it does any such optimization right now (in
fact, the MR code is quite FS-agnostic). Right now, tasks are run on
nodes where blocks are located but metadata about which disk the block
may reside on is not maintained by the NN, so MR can't naturally know
this to do anything about. This would be good to discuss, however
(Search or file a new JIRA?)

On Thu, Jun 30, 2011 at 6:33 PM, Evert Lammerts <Evert.Lammerts@sara.nl> wrote:
> Hi list,
> How does the NN place blocks on the disks within a single node? Does it spread out adjecent
blocks of a single file horizontally over the disks? For example, lets say I have four DN's
and each has 4 disks. (And forget about replication.) If I copy a file existing of 16 blocks
of 128MB each to the cluster, will each disk have exactly one block of the file?
> If I run some job over this file with its sixteen blocks this is important, since the
cluster would use its maximum I/O capabilities.
> This leads me to another question (which might be better of on mapred-user). Does the
JT schedule its tasks to maximally use I/O capabilities? Would it try to process blocks that
reside on a disk that is not currently being read from or written to? Or does it just use
a randomized strategy?
> Cheers,
> Evert

Harsh J

View raw message