hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney <doga...@gmail.com>
Subject Re: MapReduce in hbase 0.20
Date Tue, 07 Jul 2009 16:42:20 GMT
On Tue, Jul 7, 2009 at 18:57, stack <stack@duboce.net> wrote:

> 2009/7/7 Doğacan Güney <dogacan@gmail.com>
>
> > Hi list,
> >
> > In current trunk, TableReducer is defined like this:
> >
> > ....
> > public abstract class TableReducer<KEYIN, VALUEIN>
> > extends Reducer<KEYIN, VALUEIN, ImmutableBytesWritable, Put>
> > ....
> >
> > As VALUEOUT is a Put, I guess one can not delete columns (like we could
> > do with BatchUpdate) using collect(). I can still create Delete-s in
> > #reduce
> > and
> > do a table.delete but that seems unintuitive to me. Am I missing
> something
> > here
> > or is this the intended behavior?
>
>
>
> Thats intended behavior for that class.  Put and Delete do not share common
> ancestor other than Writable so its a little awkward.
>
> What would you suggest Doğacan?  Maybe we should add Marker interfaces to
> Put and Delete and then change TableReducer to take the Marker?
>

Sure, that's a good idea.

I haven't studied hadoop 0.20's API much yet so I am not sure if this can be
done but can hbase have its own ReduceContext class? If this is possible,
then maybe we can just expose the HTable instance through the context and
allow user to do whatever he wants to do on the table (and throw an
exception if context.write is called) . I think this would be much more
simpler to understand than the write/collect() calls (e.g TableOutputFormat
ignores the collect-ed keys). Does this make sense?


>
> Now is a good time to bring this up before it gets set in stone by the
> 0.20.0 release.
>
> Thanks for looking at this.
>

No problem :) Hbase 0.20 is shaping up to be really awesome, btw :)


>
> St.Ack
>



-- 
Doğacan Güney

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message