hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: Who splits the file into blocks
Date Sun, 31 Mar 2013 16:46:39 GMT
I think what Sai was asking is when client asks namenode to give it a list
of data nodes then how does the namenode knows as how many blocks would be
required to store the entire file.

I think the way it works is client requests the NN for list of blocks and
then the client writes the first block in the nodes what the NN has
specified and then it again requests the NN for another set of blocks and
so on.Client would know when the EOF is reached.

Jens has mentioned the way NN decides where to allocate the block.I mean in
which DN's the blocks are to be written.


On Sun, Mar 31, 2013 at 10:00 PM, Jens Scheidtmann <
jens.scheidtmann@gmail.com> wrote:

> Dear Sai Sai,
> "Hadoop, the definitive guide" says regarding default replica placement:
> - first replica is placed on the same node as the client (lowest bandwidth
> penalty).
> - second replica is placed off-rack, at a random node of the other rack
> (avoiding busy racks).
> - third replicate is placed on random node on rack where second replica is
> stored.
> - other replicas are placed on random nodes of the cluster (avoiding busy
> racks).
> If client is not on the cluster, first replica is placed on a random node
> (avoiding busy racks).
> Best regards,
> Jens

View raw message