crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Thoughts on supporting HBase 0.96
Date Wed, 16 Oct 2013 15:10:22 GMT
On Wed, Oct 16, 2013 at 8:02 AM, Gabriel Reid <gabriel.reid@gmail.com>wrote:

> On Wed, Oct 16, 2013 at 4:34 PM, Josh Wills <jwills@cloudera.com> wrote:
>
> > On Wed, Oct 16, 2013 at 12:15 AM, Gabriel Reid <gabriel.reid@gmail.com
> > >wrote:
> >
> > > Wouldn't a derived PType (like in o.a.c.types.PTypes) be a better fit
> > here?
> > >
> >
> > That was my initial attempt, and in an ideal world, my preferred
> solution--
> > but I haven't figured out how to make it work. The question here is: what
> > do I derive a KeyValue object to? What I really want, for purposes of
> > reading it/writing it to one of our HBase IO formats, is to map it to
> > itself, and not some subclass of Writable. Another option might be an
> > extension of WritableType to handle these special case formats-- I'll
> take
> > a crack at getting that to work.
> >
>
> I'm sure I'm just missing something obvious, but I don't totally get it.
> What I had
> in my head is that KeyValue, Put, Delete, Result, etc could all be derived
> to byte
> arrays, with the KeyValueSerialization, MutationSerialization, and
> ResultSerialization
> classes being used in the MapFns within the derived PType to go between the
> type and its byte representation, i.e.
>
>    public static PType<KeyValue> keyValue(PTypeFamily ptf) {
>       return ptf.derived(
>          KeyValue.class,
>          BYTES_TO_KEYVALUE_VIA_KVSERIALIZATION,
>          KEYVALUE_TO_BYTES_VIA_KVSERIALIZATION,
>          ptf.bytes());
>    }
>
> I'm guessing this is the same thing you're talking about, which I assume
> means that
> I'm missing something simple as to why that wouldn't just work, but I'm not
> sure
> what it is that I'm missing.
>
>
The rub is the Input and Output formats, which don't expect bytes-- they
expect either subclasses of the Mutation interface (Put or Delete), or
KeyValue (for HFile) or Result (for HTable) inputs. So we would need to
change the input and output formats so that they would take in bytes as
arguments and then convert them back to the objects that the HBase APIs
expect, so something like:

getOutputMapFn() -> OutputFormat
Put -> bytes() -> Put

That isn't the end of the world, it's just a little odd. We'd need to do
something similar on the Input format side as well, so like:

InputFormat -> getInputMapFn()
Result -> bytes() -> Result



>

> >
> >
> > > A whole new PTypeFamily sounds like a lot of work (unless maybe if it
> > was a
> > > subclass of one of the existing ones), and I think there's still a fair
> > bit
> > > of code
> > > that assumes that Avro & Writable are the only two possible PTypeFamily
> > > implementations.
> > >
> >
> > For any kind of intermediate processing, that is still true. The
> > HBaseTypeFamily would only ever really appear at the input or output for
> a
> > job.
> >
> >
> True, although of course it would be nice if we wouldn't have that
> limitation.
>
> - Gabriel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message