hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Question about MapReduce
Date Mon, 29 Oct 2012 17:03:21 GMT
In line.

On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> I'm replying to myself ;)
> I found "cleanup" and "setup" methods from the TableMapper table. So I
> think those are the methods I was looking for. I will init the
> HTablePool there. Please let me know if I'm wrong.
> Now, I still have few other questions.
> 1) context.getCurrentValue() can throw a InterrruptedException, but
> when can this occur? Is there a timeout on the Mapper side? Of it's if
> the region is going down while the job is running?

You do not need to call  context.getCurrentValue(). The 'value' argument to
map method[1] has the information you are looking for.

> 2) How can I pass parameters to the Map method? Can I use
> job.getConfiguration().put to add some properties there, can get them
> back in context.getConfiguration.get?

Yes, thats how it is done.

> 3) What's the best way to log results/exceptions/traces from the map
> method?

In most cases, you'll have mapper and reducer classes as nested static
classes within some enclosing class. You can get handle to the Logger from
the enclosing class and do your usual LOG.info, LOG.warn yada yada.

Hope it helps.

[1] map(KEYIN key, *VALUEIN value*, Context context)

> I will search on my side, but some help will be welcome because it
> seems there is not much documentation when we start to dig a bit :(
> JM
> 2012/10/27, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
> > Hi,
> >
> > I'm thinking about my firs MapReduce class and I have some questions.
> >
> > The goal of it will be to move some rows from one table to another one
> > based on the timestamp only.
> >
> > Since this is pretty new for me, I'm starting from the RowCounter
> > class to have a baseline.
> >
> > There are few things I will have to update. First, the
> > createSumittableJob method to get timestamp range instead of key
> > range, and "play2 with the parameters. This part is fine.
> >
> > Next, I need to update the map method, and this is where I have some
> > questions.
> >
> > I'm able to find the timestamp of all the cf:c from the
> > context.getCurrentValue() method, that's fine. Now, my concern is on
> > the way to get access to the table to store this field, and the table
> > to delete it. Should I instantiate an HTable for the source table, and
> > execute and delete on it, then do an insert on another HTable
> > instance?  Should I use an HTablePool? Also, since I’m already on the
> > row, can’t I just mark it as deleted instead of calling a new HTable?
> >
> > Also, instead of calling the delete and put one by one, I would like
> > to put them on a list and execute it only when it’s over 10 members.
> > How can I make sure that at the end of the job, this is flushed? Else,
> > I will lose some operations. Is there a kind of “dispose” method
> > called on the region when the job is done?
> >
> > Thanks,
> >
> > JM
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message