hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sriranjan Manjunath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1102) Collect number of spills per job
Date Thu, 17 Dec 2009 02:28:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sriranjan Manjunath updated PIG-1102:

    Attachment: PIG_1102.patch

There are no test cases included in the patch since it was difficult to consistently spill
in a unit test case. I have manually tested the change. The easiest way to test this to load
a huge data bag (1gb or so) and watch the map reduce UI. The UI will show new counters - SPILLABLE_MEMORY_MANAGER_SPILL_COUNT
or PROACTIVE_SPILL_COUNT depending on the type of POPackage used.

> Collect number of spills per job
> --------------------------------
>                 Key: PIG-1102
>                 URL: https://issues.apache.org/jira/browse/PIG-1102
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Sriranjan Manjunath
>             Fix For: 0.7.0
>         Attachments: PIG_1102.patch
> Memory shortage is one of the main performance issues in Pig. Knowing when we spill do
the disk is useful for understanding query performance and also to see how certain changes
in Pig effect that.
> Other interesting stats to collect would be average CPU usage and max mem usage but I
am not sure if this information is easily retrievable.
> Using Hadoop counters for this would make sense.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message