crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: extending crunch
Date Thu, 15 Nov 2012 02:54:24 GMT
I'm always glad to help people to extend Crunch in ways that are useful for
them. I think that most things that involve type-related extensions can be
handled using the PTypes.derived() function, which can be used to create
custom PTypes that are mapped to underlying serialized types, so that you
could do something like

// Forgive my syntax errors, I'm doing this w/o an IDE
PType<Object> objectType = PTypes.derived(Object.class, new
InputMapFn<BytesWritable, Object>(), new OutputMapFn<Object,
BytesWritable>(), Writables.writables(BytesWritable.class));

...which is essentially how Scrunch works: the PTypes { } functionality in
Scrunch maps from Scala types to Java types using the derived functionality.

The Converter stuff is internal to Avro and Writable, I can't think of a
case where that would need to be exposed outside the package (i.e., once
you've decided on whether to use Writables or Avro as your serialization
framework, the choice of Converter is fixed.)

If you have a use case where the derived type can't handle the conversion
or is a poor choice for whatever reason, I'm all about having a discussion
and trying out different designs.

Josh


On Wed, Nov 14, 2012 at 6:18 PM, Victor Iacoban <victor.iacoban@gmail.com>wrote:

> Hi,
>
> I'm very interested in writing a wrapper library around Apache Crunch for
> Clojure, something similar to existing Scrunch.
> How do you recommend to start?
>
> I was looking through Crunch code and it looks like I can pretty easily
> integrate it in clojure by adding some custom WritableType type.
> Something like WritableType<Object, ByteWritable> with a custom converter
> or inputFn/outputFn functions.
>
> Regretfully there are several issues with this approach and instead I'd
> have to duplicate all those type classes for a new type set
> * WritableType has a package visible constructor so I cannot extend it and
> cannot instantiate it
> * Converter is instantiated inside WritableType constructor so in case I
> need a different converter I'm stuck
> * Writables has a factory method for WritableType but it's private
> * it looks like there is an attempt to support additional WritableTypes
> through EXTENSIONS in Writables but it would only work for cases where in
> WritableType<T, W> both T and W are hadoop writables
>
> So what do you think is a best solution, is it possible to open up the api
> to support custom WritableTypes or the only option for me is to implement a
> new ClojurePType and all related classes?
>
> Hope I'm not too detailed, but at this stage you all are probably very
> familiar with the code
>
> Thanks,
> Victor
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message