hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1544) proactive-spill bags should share the memory alloted for it
Date Tue, 17 Aug 2010 19:03:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899526#action_12899526

Thejas M Nair commented on PIG-1544:

bq. We should not be using these bags for the cases like UDF for exactly the reason you are
The case I had in mind was not one where UDF is creating proactive-spill bags, but case where
udf input takes bags and they happen to be of proactive-spilling type and the udf retains
bags from previous rows.

Anyway, I have come up with a more realistic(?) use case where it is difficult to determine
the number of proactive-spill bags that will be present at run time -

L = load 'f1' as ( c1 : int, b1 : bag{ } );
F1 = foreach L { d = distinct b1; generate c1, d; }    -- InternalDistinctBag will be created
G = group F by c1 using 'merge'; -- This group-by could [1] accumulate several of these  
InternalDistinctBag objects
F2 = foreach G generate ...

[1] - This does not happen because the query plan has a PORelationToExpressionProject after
the result from PODistinct which copies the bag. But it looks like we can optimize and get
rid of that bag in this case.


> proactive-spill bags should share the memory alloted for it
> -----------------------------------------------------------
>                 Key: PIG-1544
>                 URL: https://issues.apache.org/jira/browse/PIG-1544
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
> Initially proactive spill bags were designed for use in (co)group (InternalCacheBag)
and they knew the total number of proactive bags that were present, and shared the memory
limit specified using the property pig.cachedbag.memusage .
> But the two proactive bag implementations were added later - InternalDistinctBag and
InternalSortedBag are not aware of actual number of bags being used - their users always assume
total-numbags = 3. 
> This needs to be fixed and all proactive-spill bags should share the memory-limit .

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message