incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: extending crunch
Date Thu, 15 Nov 2012 16:36:57 GMT
On Thu, Nov 15, 2012 at 8:14 AM, Victor Iacoban
<victor.iacoban@gmail.com> wrote:
> I'm not a clojure wizard myself but it feels like clojure REPL with crunch
> would be a terrific experimentation environment.
>
> I've tried crunch from java and I was impressed, it's very easy to connect
> non-standard sources and reasonable easy to define the flow.
>
> I tried to use cascalog for my prototyping env but although it's very good
> on flow definition, cascading lacks a lot in flexibility when you need to
> process something else except for text or sequesnce files.
>
> "clunch" sounds like a good name to me ;)

LOL. "Clutch" has a nice ring to it. ;-)

>
> -- victor
>
>
> On Thu, Nov 15, 2012 at 10:58 AM, Joseph Adler <joseph.adler@gmail.com>wrote:
>
>> Personally, I'd love to see Crunch mixed with Clojure. I was thinking about
>> this myself, but I'd rather see someone who really knows Clojure take this
>> on.
>>
>> Just don't call it Clunch.
>>
>> -- Joe
>>
>>
>> On Thu, Nov 15, 2012 at 5:04 AM, Victor Iacoban <victor.iacoban@gmail.com
>> >wrote:
>>
>> > Thanks Josh, will give this a try
>> >
>> >
>> > On Wed, Nov 14, 2012 at 9:54 PM, Josh Wills <josh.wills@gmail.com>
>> wrote:
>> >
>> > > I'm always glad to help people to extend Crunch in ways that are useful
>> > for
>> > > them. I think that most things that involve type-related extensions can
>> > be
>> > > handled using the PTypes.derived() function, which can be used to
>> create
>> > > custom PTypes that are mapped to underlying serialized types, so that
>> you
>> > > could do something like
>> > >
>> > > // Forgive my syntax errors, I'm doing this w/o an IDE
>> > > PType<Object> objectType = PTypes.derived(Object.class, new
>> > > InputMapFn<BytesWritable, Object>(), new OutputMapFn<Object,
>> > > BytesWritable>(), Writables.writables(BytesWritable.class));
>> > >
>> > > ...which is essentially how Scrunch works: the PTypes { } functionality
>> > in
>> > > Scrunch maps from Scala types to Java types using the derived
>> > > functionality.
>> > >
>> > > The Converter stuff is internal to Avro and Writable, I can't think of
>> a
>> > > case where that would need to be exposed outside the package (i.e.,
>> once
>> > > you've decided on whether to use Writables or Avro as your
>> serialization
>> > > framework, the choice of Converter is fixed.)
>> > >
>> > > If you have a use case where the derived type can't handle the
>> conversion
>> > > or is a poor choice for whatever reason, I'm all about having a
>> > discussion
>> > > and trying out different designs.
>> > >
>> > > Josh
>> > >
>> > >
>> > > On Wed, Nov 14, 2012 at 6:18 PM, Victor Iacoban <
>> > victor.iacoban@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm very interested in writing a wrapper library around Apache Crunch
>> > for
>> > > > Clojure, something similar to existing Scrunch.
>> > > > How do you recommend to start?
>> > > >
>> > > > I was looking through Crunch code and it looks like I can pretty
>> easily
>> > > > integrate it in clojure by adding some custom WritableType type.
>> > > > Something like WritableType<Object, ByteWritable> with a custom
>> > converter
>> > > > or inputFn/outputFn functions.
>> > > >
>> > > > Regretfully there are several issues with this approach and instead
>> I'd
>> > > > have to duplicate all those type classes for a new type set
>> > > > * WritableType has a package visible constructor so I cannot extend
>> it
>> > > and
>> > > > cannot instantiate it
>> > > > * Converter is instantiated inside WritableType constructor so in
>> case
>> > I
>> > > > need a different converter I'm stuck
>> > > > * Writables has a factory method for WritableType but it's private
>> > > > * it looks like there is an attempt to support additional
>> WritableTypes
>> > > > through EXTENSIONS in Writables but it would only work for cases
>> where
>> > in
>> > > > WritableType<T, W> both T and W are hadoop writables
>> > > >
>> > > > So what do you think is a best solution, is it possible to open up
>> the
>> > > api
>> > > > to support custom WritableTypes or the only option for me is to
>> > > implement a
>> > > > new ClojurePType and all related classes?
>> > > >
>> > > > Hope I'm not too detailed, but at this stage you all are probably
>> very
>> > > > familiar with the code
>> > > >
>> > > > Thanks,
>> > > > Victor
>> > > >
>> > >
>> >
>>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

Mime
View raw message