hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase MapReduce Job with Multiple Scans
Date Tue, 03 Apr 2012 13:42:58 GMT
Take a look at HBASE-3996 where Stack has some comments outstanding.


On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <squinn@moxiegroup.com> wrote:

> Hello,
> I have a table whose key is structured as "eventType + time", and I need to
> periodically run a map reduce job on the table which will process each
> event type within a specific time range.  So, the map reduce job needs to
> process multiple segments of the table as input, and therefore can't be
> setup with a single scan.  (Using a filter on the scan would theoretically
> work, but doesn't scale well as the data size increases.)
> Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" only
> supports a single scan there doesn't appear to be a "built in" way to run a
> mapreduce job that has multiple scans as input.  I found the following
> related post which points me to creating my own map reduce "InputFormat"
> type by extending HBase's "TableInputFormatBase" and overriding the
> "getSplits()" method:
> http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
> So, that's currently the direction I'm heading.  However, before I got too
> far in the weeds I thought I'd ask:
> 1. Is this still the best/right way to handle this situation?
> 2. Does anyone have an example of a custom InputFormat that sets up
> multiple scans against an HBase input table (something like the
> "MultiSegmentTableInputFormat" referred to in the post) that they'd be
> willing to share?
> Thanks,
>       -Shawn

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message