hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N Keywal <nkey...@gmail.com>
Subject Re: HBase and MapReduce data locality
Date Wed, 29 Aug 2012 07:38:13 GMT
Inline. Just a set of "you're right" :-).
It's documented here:

On Wed, Aug 29, 2012 at 8:06 AM, Robert Dyer <rdyer@iastate.edu> wrote:

> Ok but does that imply that only 1 of your compute nodes is promised
> to have all of the data for any given row?  The blocks will replicate,
> but they don't necessarily all replicate to the same nodes right?


> So if I have say 2 column families (cf1, cf2) and there is 2 physical
> files on the HDFS for those (per region) then those files are created
> on one datanode (dn1) which will have all blocks local to that node.

Yes. Nit: datanodes don't "see" files, only blocks. But the logic remains
the same.

> Once it replicates those blocks 2 more times by default, isn't it
> possible the blocks for cf1 will go to dn2, dn3 while the blocks for
> cf2 goes to dn4, dn5?

Yes, it's possible (and even likely).

View raw message