hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: DBInputFormat
Date Fri, 12 Feb 2010 20:20:17 GMT
DBInputFormat splits the count() from the RDBMS table into the number of
mappers. If you want to split using your own scheme, you'll have to write
your own input format or tweak the existing one.

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

On Fri, Feb 12, 2010 at 12:08 PM, Stack <stack@duboce.net> wrote:

> On Fri, Feb 12, 2010 at 4:32 AM, Gaurav Vashishth <vashgaurav@gmail.com>
> wrote:
> >
> > I have the Map Reduce function whose job is to process the database ,
> MySql,
> > and give us some output. For this purpose, I have created the map reduce
> > fucntion and have used the DBInputFormat, but Im confused in how the
> > JobTracker will produce the splits here.
> >
> > I want that first 'n' records from the database should be processed by
> > single map task and so on and if jobtracker splits the record and give
> less
> > than 'n' records, it would be problem.
> >
> > Is there any API for getting this done or Im missing something.
> >
> Maybe you have to write your own splitter?  One that makes sure each
> task has N rows?  Is there a splitter that is part of DBInputFormat?
> Can you look at how it works?  Maybe you can specify rows per task
> just with a configuration?
> St.Ack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message