hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Reagrding HBase Hadoop multiple scan objects issue
Date Sat, 19 Jan 2013 17:20:27 GMT
Have you subscribed to user mailing list ?
Please do not mix email for user@ and subscription.

Some email system would regard messages from amazon.com as unverifiable and
put them in Spam folder.

What HBase version are you using ?

bq.  it's inefficient to have one scan object to scan everything

Have you looked at the following javadoc in Scan.java ?

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.

On Fri, Jan 18, 2013 at 2:43 PM, Xu, Leon <guodongx@amazon.com> wrote:

> Hi HBase users,
> I am currently trying to set up a denormalization map-reduce job for my
> HBase Table.
> Since our table contains large volume of data, it's inefficient to have
> one scan object to scan everything. We are only need to process those
> records that have changes. I am planning to have multiple scan objects,
> each of which scan object specifies range given that we are in track of
> what rows has been changed.
> Therefore I am trying to set up the map-reduce job with multiple scan
> objects, is this possible?
> I am seeing some post online suggesting extending the InputFormat object
> and change the getSplits, is this the most efficient way?
> Using filter seems to be not very efficient in my case because it's
> basically still scan the whole table,right? Just filter out some certain
> records.
> Can you point me to the right direction?
> Thanks
> Leon

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message