hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Quinn <squ...@moxiegroup.com>
Subject Re: HBase MapReduce Job with Multiple Scans
Date Tue, 03 Apr 2012 14:49:22 GMT
Sounds good, thanks Ted.  I'll give it a whirl and add any
comments/findings to the Jira issue.

     -Shawn

On Tue, Apr 3, 2012 at 10:45 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Stack said he might help implement his suggestions if Eran is busy.
>
> The patch doesn't depend on recent changes to the Hadoop/MapReduce.
>
> Give it a try. Feedback would help us refine the patch.
>
> Thanks
>
> On Tue, Apr 3, 2012 at 7:43 AM, Shawn Quinn <squinn@moxiegroup.com> wrote:
>
> > Thanks for the quick reply Ted!  That's exactly what I'm looking for.
> > Reading through the Jira comments I'm a bit confused on what the
> > status/plan is with that patch.  Do you expect that will be included in
> the
> > next HBase release, or has it been postponed?  Also, does that change
> > depend on any recent changes to the Hadoop/MapReduce, or will it work
> > as-is?
> >
> > In the meantime, I'll give that patch a closer look and setup some custom
> > classes in my own project to try and pull off something similar.
> >
> >     -Shawn
> >
> > On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Take a look at HBASE-3996 where Stack has some comments outstanding.
> > >
> > > Cheers
> > >
> > > On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <squinn@moxiegroup.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I have a table whose key is structured as "eventType + time", and I
> > need
> > > to
> > > > periodically run a map reduce job on the table which will process
> each
> > > > event type within a specific time range.  So, the map reduce job
> needs
> > to
> > > > process multiple segments of the table as input, and therefore can't
> be
> > > > setup with a single scan.  (Using a filter on the scan would
> > > theoretically
> > > > work, but doesn't scale well as the data size increases.)
> > > >
> > > > Given that the HBase provided "TableMapReduceUtil.initTableMapperJob"
> > > only
> > > > supports a single scan there doesn't appear to be a "built in" way to
> > > run a
> > > > mapreduce job that has multiple scans as input.  I found the
> following
> > > > related post which points me to creating my own map reduce
> > "InputFormat"
> > > > type by extending HBase's "TableInputFormatBase" and overriding the
> > > > "getSplits()" method:
> > > >
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
> > > >
> > > > So, that's currently the direction I'm heading.  However, before I
> got
> > > too
> > > > far in the weeds I thought I'd ask:
> > > >
> > > > 1. Is this still the best/right way to handle this situation?
> > > >
> > > > 2. Does anyone have an example of a custom InputFormat that sets up
> > > > multiple scans against an HBase input table (something like the
> > > > "MultiSegmentTableInputFormat" referred to in the post) that they'd
> be
> > > > willing to share?
> > > >
> > > > Thanks,
> > > >
> > > >       -Shawn
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message