pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Jaql reactions?
Date Sat, 08 Dec 2007 00:56:27 GMT

Utkarsh, 

Thanks for your comments.  I think I must have been a little unclear on some
of my statements.  See below for more.


On 12/7/07 12:18 PM, "Utkarsh Srivastava" <utkarsh@yahoo-inc.com> wrote:

> Jaql is tied to JSON data, whereas Pig is data-format-agnostic.

I get the impression that Jaql is tied less to JSON than it appears at
first.  In particular, it looked to me like the on-disk format of data files
could be more flexible.  Certainly adding an abstraction layer for any
record reader would be trivial.  Similarly, there is nothing that says or
requires that they actually pass around JSON encoded strings internally and
there are several statements that imply that they actually pass around data
structures whose only relationship to JSON is of data to a printable form.

>> A) specific and direct access to map/reduce in a functional programming
>> syntax.
> 
> If a language has primitives for per-record processing, grouping, and
> group-wise aggregation, which both Pig and Jaql do, then direct
> access to map-reduce is just syntactic sugar on top of these primitives.

Hmmm.... The key-word here is functional.  Jaql is a higher-order functional
language with lambda.  And map-reduce is a function that operates on
functions and data together.  The only thing I might like better is a
curried version of map-reduce as a function of two functions that returns a
function that processes data (fast).

Pig doesn't do anything like this and the difference appears to me to be
much more than syntactic sugar.  Having the functional representation gives
you the guts of programmatic transformations essentially for free.  This is
important.

I can't tell if Jaql things of data processing expressions as functional
compositions, but if it does, very cool things can become doable.

You are nearly right that in terms of expressive power, Jaql's explicit
map-reduce is only sugar, but this is only true if you limit yourself to
record processing primitives.  If it is a full-scale first-class
higher-order function, then it is a different beast altogether.

> 
> In Pig, Map-Reduce is written as:
> 
> A = foreach input generate flatten(Map(*));
> B = group A by $0;
> C = foreach B generate Reduce(*);

And here is an important difference.  The expression [foreach input generate
flatten(Map(*))] CANNOT be expressed in Pig in functional form.  There isn't
something equivalent to [lambda(Map) return lambda(input) {foreach input
generate flatten(Map(*))].  If that were available, then I would be able to
write programs that manipulate program expressions in very interesting ways.

Just as importantly, what you have provided is a recipe for computing, but
not a function.  Providing mapreduce as a function is important for
supporting programmatic transmformations.

> If people really want map-reduce as a programming abstraction, where
> the "group" operation is implicit, it would be easy to add this as a
> macro in Pig.

Indeed, but macros do not make a functional language.

Pig's lazy evaluation semantics remind me quite a bit of functional
programming.  Why stop halfway?



Mime
View raw message