crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Thoughts on supporting HBase 0.96
Date Wed, 16 Oct 2013 06:46:56 GMT
On Tue, Oct 15, 2013 at 11:42 PM, Chao Shi <stepinto@live.com> wrote:

> Hi Josh,
>
> I don't understand why needs another PTypeFamily here. I think we can
> simply provide some pre-defined PTypes.
>
> interface HFilePTypes {
>   static KEY_VALUE_PTYPE = xxx
>   static PUT_PTYPE = xxx
> }
>

Technically, every PType has to provide an implementation of the
PTypeFamily getFamily() method-- even if it's just returning a dummy object.


>
> After a quick scan of the latest code, it seems they have dropped any
> Writable things and moved to protobuf. For example, there is
> RequestConverter class which converts Mutations (super class of Put and
> Delete) to the protobuf message. We can leverage this to implement our
> PType. But unfortunately, this piece of code does not exist in 0.94. So If
> we want to support both, we have to do similar thing as we did for hadoop 1
> and 2, which is very boring. (I have no experience in using HBase releases
> after 0.94, so what I said may be wrong.)
>

Totally agree that it's boring-- I'm happy to volunteer to do it. ;)


>
>
> 2013/10/16 Josh Wills <jwills@cloudera.com>
>
> > Hey all,
> >
> > To kill some time this afternoon, I took a pass at figuring out what
> > changes would be needed in Crunch to support HBase 0.96, which is going
> > through a few release candidates right now. I started out by building
> > against the 0.95.2 release, which has most of the API changes that I'm
> told
> > we can expect in 0.96.
> >
> > The most consequential change I found is that many of the core HBase
> > classes we operate on-- Put, Delete, KeyValue, and Result-- will no
> longer
> > implement the Writable interface. Instead, the HBase team has added a
> > number of SerializationFactory classes for these types, which map the
> POJO
> > versions of those objects on to protocol buffers. This means that the
> > current trick of creating PTypes for HBase like this:
> >
> > PType<Result> ptype = Writables.writables(Result.class);
> >
> > won't work anymore in 0.96, i.e., the HBase data classes won't fit into
> > either of the existing type families.
> >
> > The best solution I've come up with so far is to create a new,
> > HBase-specific PTypeFamily for supporting the way these classes are
> > serialized now. I'm not sure if there's a better approach here and/or how
> > complex this particular PTypeFamily implementation would need to be; I'm
> > very much open to ideas on how to proceed here.
> >
> > J
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message