hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: possible use of Pig for OLAP
Date Tue, 20 Nov 2007 18:31:04 GMT
Chris Olston wrote:
> Sounds interesting. Pig is geared toward large-scale aggregation 
> operations, in the style of OLAP.
> Regarding your 3rd paragraph question, do you mean:
> a) there are several interrelated aggregation expressions that you want 
> evaluated in just one pass over the data, or
> b) you do some initial aggregation, display it to the user, who can do 
> "drill-down" operations in the GUI which require you to look up more 
> data in the backend
> ?
> For (a), yes Pig can do that, although currently you have to encode it 
> explicitly as a single Pig program (in future versions, we might be able 
> to take multiple related Pig programs and execute them in a joint 
> fashion). For (b), we don't currently have a mechanism to do that 
> without reloading the data, although perhaps the operating system's file 
> cache would help with that, under the covers, if the file partitions fit 
> in memory and don't get evicted.

Would it be possible to modify Pig (and underlying local/mapreduce impl) 
so that if a specific syntax is used then an intermediate result is also 
stored into a temporary file? This way, on the first dump/store Pig 
would produce all intermediate results, then keep some of them, and 
re-use them for subsequent operators?

Example - let's say that ':=' means that the result should be kept 
around until exit (or until any of previous intermediate results changes):

-- A is not persisted
A = load 'sample.txt' as (date, time, ip, query);
-- B is to be persisted in a temp file
B := group A by ip;
-- compile & execute - creates B in a temp file
dump B;
C = foreach B generate group, query;
-- this uses already existing B data from a temp file
dump C;

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message