hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY
Date Tue, 12 May 2009 18:29:45 GMT

    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708551#action_12708551

Pradeep Kamath commented on PIG-802:

Adding some more details:
A new kind of bag - ReadOnceBag needs to be implemented. This bag will have reference to the
"key"  currently being processed and the iterator to values provided by hadoop in reduce().
The ReadOnceBag's iterator will simply iterate over the hadoop iterator at each call and construct
a tuple by using the key and value (see POPackage.java for details on how this is done). POPackage
should also be changed or a new class introduced which creates ReadOnceBags instead of regular
bags. This creation of the bag should only initialize the bag with the key and iterator.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
> Order by should be changed to not use POPackage to put all of the tuples in a bag on
the reduce side, as the bag is just immediately flattened. It can instead work like join does
for the last input in the join. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message