hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Quinn <squ...@moxiegroup.com>
Subject Re: HBase MapReduce Job with Multiple Scans
Date Tue, 03 Apr 2012 14:43:01 GMT
Thanks for the quick reply Ted!  That's exactly what I'm looking for.
Reading through the Jira comments I'm a bit confused on what the
status/plan is with that patch.  Do you expect that will be included in the
next HBase release, or has it been postponed?  Also, does that change
depend on any recent changes to the Hadoop/MapReduce, or will it work as-is?

In the meantime, I'll give that patch a closer look and setup some custom
classes in my own project to try and pull off something similar.

     -Shawn

On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Take a look at HBASE-3996 where Stack has some comments outstanding.
>
> Cheers
>
> On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <squinn@moxiegroup.com> wrote:
>
> > Hello,
> >
> > I have a table whose key is structured as "eventType + time", and I need
> to
> > periodically run a map reduce job on the table which will process each
> > event type within a specific time range.  So, the map reduce job needs to
> > process multiple segments of the table as input, and therefore can't be
> > setup with a single scan.  (Using a filter on the scan would
> theoretically
> > work, but doesn't scale well as the data size increases.)
> >
> > Given that the HBase provided "TableMapReduceUtil.initTableMapperJob"
> only
> > supports a single scan there doesn't appear to be a "built in" way to
> run a
> > mapreduce job that has multiple scans as input.  I found the following
> > related post which points me to creating my own map reduce "InputFormat"
> > type by extending HBase's "TableInputFormatBase" and overriding the
> > "getSplits()" method:
> >
> >
> >
> http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
> >
> > So, that's currently the direction I'm heading.  However, before I got
> too
> > far in the weeds I thought I'd ask:
> >
> > 1. Is this still the best/right way to handle this situation?
> >
> > 2. Does anyone have an example of a custom InputFormat that sets up
> > multiple scans against an HBase input table (something like the
> > "MultiSegmentTableInputFormat" referred to in the post) that they'd be
> > willing to share?
> >
> > Thanks,
> >
> >       -Shawn
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message