hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Lammerts <Evert.Lamme...@sara.nl>
Subject block placement
Date Thu, 30 Jun 2011 13:03:42 GMT
Hi list,

How does the NN place blocks on the disks within a single node? Does it spread out adjecent
blocks of a single file horizontally over the disks? For example, lets say I have four DN's
and each has 4 disks. (And forget about replication.) If I copy a file existing of 16 blocks
of 128MB each to the cluster, will each disk have exactly one block of the file?

If I run some job over this file with its sixteen blocks this is important, since the cluster
would use its maximum I/O capabilities.

This leads me to another question (which might be better of on mapred-user). Does the JT schedule
its tasks to maximally use I/O capabilities? Would it try to process blocks that reside on
a disk that is not currently being read from or written to? Or does it just use a randomized

View raw message