hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dyer <rd...@iastate.edu>
Subject Re: HBase and MapReduce data locality
Date Wed, 29 Aug 2012 19:58:03 GMT
Ah thanks for that link.  I missed it while browsing the docs.  The
link from there to this blog post

  http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

really answers my questions! :-)

On Wed, Aug 29, 2012 at 2:38 AM, N Keywal <nkeywal@gmail.com> wrote:
> Inline. Just a set of "you're right" :-).
> It's documented here:
> http://hbase.apache.org/book.html#regions.arch.locality
>
> On Wed, Aug 29, 2012 at 8:06 AM, Robert Dyer <rdyer@iastate.edu> wrote:
>>
>> Ok but does that imply that only 1 of your compute nodes is promised
>> to have all of the data for any given row?  The blocks will replicate,
>> but they don't necessarily all replicate to the same nodes right?
>
>
> Right.
>
>>
>> So if I have say 2 column families (cf1, cf2) and there is 2 physical
>> files on the HDFS for those (per region) then those files are created
>> on one datanode (dn1) which will have all blocks local to that node.
>
>
> Yes. Nit: datanodes don't "see" files, only blocks. But the logic remains
> the same.
>
>>
>> Once it replicates those blocks 2 more times by default, isn't it
>> possible the blocks for cf1 will go to dn2, dn3 while the blocks for
>> cf2 goes to dn4, dn5?
>
>
> Yes, it's possible (and even likely).

Mime
View raw message