hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xu, Leon" <guodo...@amazon.com>
Subject Reagrding HBase Hadoop multiple scan objects issue
Date Fri, 18 Jan 2013 22:43:01 GMT
Hi HBase users,

I am currently trying to set up a denormalization map-reduce job for my HBase Table.
Since our table contains large volume of data, it's inefficient to have one scan object to
scan everything. We are only need to process those records that have changes. I am planning
to have multiple scan objects, each of which scan object specifies range given that we are
in track of what rows has been changed.
Therefore I am trying to set up the map-reduce job with multiple scan objects, is this possible?
I am seeing some post online suggesting extending the InputFormat object and change the getSplits,
is this the most efficient way?

Using filter seems to be not very efficient in my case because it's basically still scan the
whole table,right? Just filter out some certain records.

Can you point me to the right direction?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message