pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-2888) Improve performance of POPartialAgg
Date Mon, 27 Aug 2012 05:31:08 GMT

     [ https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Dmitriy V. Ryaboy updated PIG-2888:

    Attachment: partialagg_patch_3.patch

Minor logging and spill perf improvements (reusing the iterator, forcing an agg if any list
gets too big, being slightly more clever about hashmap sizing).
> Improve performance of POPartialAgg
> -----------------------------------
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, partialagg_patch_3.patch
> During performance testing, we found that POPartialAgg can cause performance degradation
for Pig jobs when the Algebraic UDFs it's being applied to aren't well suited to the operator's
assumptions. Changing the implementation to a more flexible hash-based model can provide significant
performance improvements.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message