pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utkarsh Srivastava <utka...@yahoo-inc.com>
Subject Re: Jaql reactions?
Date Fri, 07 Dec 2007 20:18:34 GMT
Jaql is very much in the same spirit as Pig, and in fact the language  
is quite similar. (They've chosen to sprinkle in some SQL-style  
declarative clauses, such as WHERE clauses attached to many of the  
operators, whereas in Pig we've explicitly avoided having operators  
do multiple different kinds of things.) You would do a WHERE clause  
in Pig by writing an explicit FILTER statement.

Jaql is tied to JSON data, whereas Pig is data-format-agnostic. Pig  
can operate over JSON data as a special case.  To demonstrate this, I  
put together a JSON StorageFunction for Pig, and examples of how it  
can be used  (both attached). With this function, Pig can operate  
over JSON data in much the same way that Jaql does. (It requires the  
latest version of Pig; so if you want to try it please refresh from  
SVN first.)

Some other observations:

 >A) specific and direct access to map/reduce in a functional  
programming
 >syntax.

If a language has primitives for per-record processing, grouping, and  
group-wise aggregation, which both Pig and Jaql do, then direct  
access to map-reduce is just syntactic sugar on top of these primitives.

In Pig, Map-Reduce is written as:

A = foreach input generate flatten(Map(*));
B = group A by $0;
C = foreach B generate Reduce(*);

Where "Map" and "Reduce" are user-supplied Pig functions.

If people really want map-reduce as a programming abstraction, where  
the "group" operation is implicit, it would be easy to add this as a  
macro in Pig.


 >B) data has a concrete syntactic form that can be displayed and  
understood
 >along with other concrete forms that guarantee to keep the same  
semantics in
 >terms of tagged data elements.  This universal tagging in the data  
makes a
 >lot of run-time schema things pretty trivial.  It also allows test  
data to
 >be written into a script or example program and allows that test  
data to be
 >processed to a concrete result without involving the cluster.

Pig's "maps" give very similar functionality:
(1) the schema can vary from record to record (i.e., each record can  
have a different set of fields)
(2) operations can reference the schema of a record at run-time, just  
like in Jaql.

In fact, "map" structures are the bread-and-butter of JSON.


Utkarsh


Mime
View raw message