pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Baraa Mohamad <baraa.issa.moha...@gmail.com>
Subject Re: dataflow in logical plan
Date Mon, 24 Jan 2011 21:23:46 GMT
Thank you very much for your explination ,
Just to verify that I understood correctly
For example if myfile contains the following data
1 3 4
3 4 6
7 8 2
4 5 9
9 3 5
6 6 2

so all this data will be sent to Proj(0) operator which gives as a results
1
3
7
4
9
6

After that all this data in myfile will be sent to the filter operator, so
that the filter take tow inputs the myfile data and the result of the
proj(0) > 5 which is
7
9
6

regards


On Mon, Jan 24, 2011 at 10:08 PM, Alan Gates <gates@yahoo-inc.com> wrote:

> The logical plan for your script will look like:
>
> Load -> Filter -> Store
>
> Filter will have an expression plan that looks like Proj($0) > const(5)
>
> So yes, all your data will go through the filter operator.  But keep in
> mind that there is a filter operator in each map task, so all your code will
> not go through any one instance of the operator (unless myfile is small).
>  Hope that helps.
>
> Unfortunately, there is not any great architecture document on Pig.
>  Probably the best substitute is a paper we published in VLDB 2009, which
> you can get here:
> http://infolab.stanford.edu/~olston/publications/vldb09.pdf.  Since this
> is almost 2 years old now some of the specific information is out of date
> but the basic structure is still correct.
>
> Alan.
>
>
> On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:
>
>  Hello all:
>>
>> I'm new user of Pig , and I'm very interested in the architecture of Pig.
>> I have a question about the logical plan
>>
>> In the logical plan of this example: (in attach)
>> a = load 'myfile';
>> b = filter a by $0 > 5;
>> store b into 'myfilteredfile';
>>
>>
>> Does all the data in 'myfile' will be sent in it's totality to the Proj(0)
>> operator and to the Filter Operator ??
>> More generally what are runing on the arrows in the logical plan ??
>>
>> what is the best documentation to understand the architecture of Pig not
>> only how to use it because I'll try to use it in the medical domain but
>> first I have to understand it
>> deeply
>>
>> thank you very much for your help
>>
>>
>> Baraa MOHAMAD
>> Doctorante en informatique
>> ISIMA-LIMOS
>> Université Blaise Pascal
>> Clermont-Ferrand
>> France
>> Tél:  +33 658900080
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message