incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Iacoban <victor.iaco...@gmail.com>
Subject Re: clojure + crunch = crackle
Date Wed, 28 Nov 2012 19:56:22 GMT
+1 me too :) this would make the DSL much simpler

That's what I started with but hit 2 issues:
* no easy way to detect if next step in pipeline is going to need a
PTableType or a PType.
Check my count-bytes-by-ip example, first parallelDo needs to output a
PTableType in order for next "groupByKey" to work. In order to guess the
type automatically I need somehow to replay the pipeline backwards in order
to find out what exact ptype current parallelDo requires. This is the main
hurdle. Maybe in a strongly typed language like java or scala its' not very
obvious that PTableType and PType<Pair> are basically the same thing and it
would probably make sense to merge these 2 in a single PType class. Not
really sure if crunch devs would consider something like this.

* the second issue, not as big as the first one is that ptypes trickle down
to outputs. So in order to avoid dumping my generic binary format to text
files I'd have to introduce a step at the end of pipeline to convert from
clojure data structures to some writable primitives, this would still
require users to be aware of crunch type system

Removal of explicit types is on my todo list, will try to do that as soon
as I find some time

-- Victor




On Wed, Nov 28, 2012 at 2:18 PM, Josh Wills <jwills@cloudera.com> wrote:

> On Wed, Nov 28, 2012 at 11:16 AM, Joseph Adler <joseph.adler@gmail.com
> >wrote:
>
> > Also interested in this project, but low on time for a few weeks.
> >
> > One quick bit of feedback: I strongly suspect that there is a way to
> > eliminate all the type related code from Clojure, probably by using
> Macros
> >
>
> +1-- could be really nice.
>
>
> >
> > -- Joe
> >
> >
> > On Wed, Nov 28, 2012 at 9:07 AM, Matthias Friedrich <matt@mafr.de>
> wrote:
> >
> > > Hi Victor,
> > >
> > > just for the record, I'm also very interested in this. It's just that
> > > in the time before christmas, things are really busy at work. But
> > > I'll definitely play around with crackle.
> > >
> > > Thanks,
> > >   Matthias
> > >
> > > On Tuesday, 2012-11-27, Victor Iacoban wrote:
> > > > Hey Josh,
> > > >
> > > > Nice to see some interest, I just pushed from my local repo with
> > several
> > > > bigger changes. I've separated crackle into 3 parts core, hbase and
> > > example
> > > > on my todo list:
> > > > - jar file assembly, currently I'm using jar command from shell to
> > create
> > > > the job jar, this obviously needs to be rewritten in order to make
> > > crackle
> > > > portable
> > > > - I need to add support for all sources and targets you have in
> crunch
> > > > - need to integrate crunch hbase: sources, targets and types
> > > >
> > > > after these are done, some nice to do tasks:
> > > > - cannot define mr pipelines from clojure REPL, although crackle
> > compiles
> > > > pipeline classes on the fly it still needs the code to be written to
> a
> > > > local file, so it's not as nice as it should be
> > > > - DSL sucks:
> > > >  * in current shape you don't have access to PObjects from
> intermediate
> > > > steps
> > > >  * users have to know crunch api very well otherwise they will get
> > > > confused: what type goes where and why they have to use this
> particular
> > > > function type
> > > >
> > > > Regards
> > > >
> > > > PS I'm also a clojure noob, I did learn common lisp several years ago
> > but
> > > > playing with clojure only for several months
> > > >
> > > >
> > > > On Mon, Nov 26, 2012 at 11:48 PM, Josh Wills <jwills@cloudera.com>
> > > wrote:
> > > >
> > > > > Victor,
> > > > >
> > > > > Just got my own personal fork-- congrats on getting the MR pipeline
> > > impl
> > > > > working. What needs doing? Keep in mind that I'm a total clojure
> > n00b,
> > > > > despite repeated encouragement from lots of developers I respect
> and
> > > > > admire.
> > > > >
> > > > > Josh
> > > > >
> > > > >
> > > > > On Tue, Nov 20, 2012 at 2:33 PM, Victor Iacoban <
> > > victor.iacoban@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have the basics done here:
> > > > > > https://github.com/viacoban/crackle
> > > > > >
> > > > > > It's only MemPipeline for now, still have to build the jar in
> > > background
> > > > > > for MRPipeline, but before going there I have a small issue
to
> > solve.
> > > > > >
> > > > > > So if anyone has written several clojure macroses or know
> somebody
> > > who
> > > > > did
> > > > > > please write to me directly and we will take it from there
> > > > > >
> > > > > > Any comments or input is welcome
> > > > > >
> > > > > > Victor
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Director of Data Science
> > > > > Cloudera <http://www.cloudera.com>
> > > > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > > > >
> > >
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message