hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dyer <psyb...@gmail.com>
Subject HBase and MapReduce data locality
Date Wed, 29 Aug 2012 04:20:28 GMT
I have been reading up on HBase and my understanding is that the
physical files on the HDFS are split first by region and then by
column families.

Thus each column family has its own physical file (on a per-region basis).

If I run a MapReduce task that uses the HBase as input, wouldn't this
imply that if the task reads from more than 1 column family the data
for that row might not be (entirely) local to the task?

Is there a way to tell the HDFS to keep blocks of each region's column
families together?

View raw message