pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigExecutionModel" by Shravan Narayanamurthy
Date Wed, 30 Jan 2008 01:27:35 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by Shravan Narayanamurthy:

+ __GROUP__
+ The logical operator co-group would be converted to 3 physical operators the Local Rearrange,
Global Rearrange and Package as shown below:
+ attachment:Group.jpg
+ There will be a Local Rearrange operator for each input which will aggregate to a Global
Rearrange followed by a Package as shown below:
+ attachment:GroupPhy.jpg
+ The Local Rearrange takes the input tuple and outputs a key, value pair with the group field
as the key and the tuple as the value. For eg., (1,R) will be converted to {1,(1,R)}. Also
the tuple is tagged with the input index it originated from. In our case, if (1,R) came from
A it would be tagged 1 and if it was from B it would be tagged 2. 
+ The Global Rearrange converts the kev-value pairs of keys belonging to a partition into
a set of (key, list of values). The partition is decided by which reducer the Global Rearrange
is catering to. This need not be implemented by us as this is the intermediate step that happens
between mapper and reducer.
+ The Package just takes each key, list of values and puts it in appropriate format as required
by the co-group. So lets say we have (1,R),(2,G) in A and (1,B), (2,Y) in B. If there are
two reducers, Global Rearrange catering to reducer 1 will have {1,{(1,R),(1,B)}} as the key,
list of values which should be converted into an output tuple for co-group based on the tagged
index of the tuples in the list. So this would be converted to {1,{(1,R)},{(1,B)}}. Similarly,
{2,{(2,G),(2,Y)}} will be converted to {2,{(2,G)},{(2,Y)}} by reducer 2.
  === Comments ===
  The Physical plan and the Logical Plan were not clear to me probably because of the nested
query plan thingy. I think we need to find a better way to draw this because, the conditional
expression is an attribute of the filter and not an input to filter.

View raw message