hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-629) PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
Date Thu, 22 Jan 2009 20:31:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666280#action_12666280

Olga Natkovich commented on PIG-629:

patch committed, thanks pradeep

> PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
> -----------------------------------------------------------------------------
>                 Key: PIG-629
>                 URL: https://issues.apache.org/jira/browse/PIG-629
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-629.patch
> Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute
holding a list of operator keys corresponding to the root operators for which the tuple is
targeted. For example in a cogroup query the tuple would be destined for one of the two roots
of the plan depending on which input it is sourced from. This information is contained in
the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each
tuple this extra list needs to be attached and also on entry into the map(), the operators
corresponding to the operator keys in the list need to be looked up in the map plan.
> This overhead can be eliminated by just serializing this list of target operators at
the Record Reader level and then deserializing the list in the configure() of the map(). After
deserialization, the actual operators corresponding to the operator keys can also be looked
up in the configure() itself. This way this setup is done one time in the configure() rather
than adding extra overhead to each input tuple and each map() call.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message