hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Quinn <squ...@moxiegroup.com>
Subject HBase MapReduce Job with Multiple Scans
Date Tue, 03 Apr 2012 12:52:56 GMT
Hello,

I have a table whose key is structured as "eventType + time", and I need to
periodically run a map reduce job on the table which will process each
event type within a specific time range.  So, the map reduce job needs to
process multiple segments of the table as input, and therefore can't be
setup with a single scan.  (Using a filter on the scan would theoretically
work, but doesn't scale well as the data size increases.)

Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" only
supports a single scan there doesn't appear to be a "built in" way to run a
mapreduce job that has multiple scans as input.  I found the following
related post which points me to creating my own map reduce "InputFormat"
type by extending HBase's "TableInputFormatBase" and overriding the
"getSplits()" method:

http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects

So, that's currently the direction I'm heading.  However, before I got too
far in the weeds I thought I'd ask:

1. Is this still the best/right way to handle this situation?

2. Does anyone have an example of a custom InputFormat that sets up
multiple scans against an HBase input table (something like the
"MultiSegmentTableInputFormat" referred to in the post) that they'd be
willing to share?

Thanks,

       -Shawn

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message