crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <>
Subject Re: Thoughts on supporting HBase 0.96
Date Wed, 16 Oct 2013 06:42:11 GMT
Hi Josh,

I don't understand why needs another PTypeFamily here. I think we can
simply provide some pre-defined PTypes.

interface HFilePTypes {
  static KEY_VALUE_PTYPE = xxx
  static PUT_PTYPE = xxx

After a quick scan of the latest code, it seems they have dropped any
Writable things and moved to protobuf. For example, there is
RequestConverter class which converts Mutations (super class of Put and
Delete) to the protobuf message. We can leverage this to implement our
PType. But unfortunately, this piece of code does not exist in 0.94. So If
we want to support both, we have to do similar thing as we did for hadoop 1
and 2, which is very boring. (I have no experience in using HBase releases
after 0.94, so what I said may be wrong.)

2013/10/16 Josh Wills <>

> Hey all,
> To kill some time this afternoon, I took a pass at figuring out what
> changes would be needed in Crunch to support HBase 0.96, which is going
> through a few release candidates right now. I started out by building
> against the 0.95.2 release, which has most of the API changes that I'm told
> we can expect in 0.96.
> The most consequential change I found is that many of the core HBase
> classes we operate on-- Put, Delete, KeyValue, and Result-- will no longer
> implement the Writable interface. Instead, the HBase team has added a
> number of SerializationFactory classes for these types, which map the POJO
> versions of those objects on to protocol buffers. This means that the
> current trick of creating PTypes for HBase like this:
> PType<Result> ptype = Writables.writables(Result.class);
> won't work anymore in 0.96, i.e., the HBase data classes won't fit into
> either of the existing type families.
> The best solution I've come up with so far is to create a new,
> HBase-specific PTypeFamily for supporting the way these classes are
> serialized now. I'm not sure if there's a better approach here and/or how
> complex this particular PTypeFamily implementation would need to be; I'm
> very much open to ideas on how to proceed here.
> J
> --
> Director of Data Science
> Cloudera <>
> Twitter: @josh_wills <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message