pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1447) Tune memory usage of InternalCachedBag
Date Sat, 21 Aug 2010 00:14:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900939#action_12900939

Thejas M Nair commented on PIG-1447:

Some more reasons why higher value would still be safe -
1. A lot of the memory attributed to the InternalDistinct/InternalSorted bags used from within
nested-foreach will be shared with the InternalCacheBag in the input tuple because the pig
does not create a copy of the column objects.
2. In a nested foreach,  at a time only one inner-plan will hold references to the Internal*
bags . The internal* bags are eventually converted to DefaultDataBag by RelationToExpressionProject
in these plans. In most common cases (say you are generating multiple-count distincts, order-bys
on bags in nested foreach), that means only one Internal* bag created within nested foreach
will be referenced at a time. I tried comparing the memory footprint with different number
of distinct operations in a nested-foreach, and found them to be in same range.
I am planning to set the default at 20% for now. If we find the memory limits being hit as
a result of this during the beta testing period, we can reduce the default.

> Tune memory usage of InternalCachedBag
> --------------------------------------
>                 Key: PIG-1447
>                 URL: https://issues.apache.org/jira/browse/PIG-1447
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>         Attachments: L15_modified.pig, L15_modified2.pig, PIG-1447.1.patch
> We need to find a better value for "pig.cachedbag.memusage".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message