hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1448) Detach tuple from inner plans of physical operator
Date Sat, 12 Jun 2010 17:58:13 GMT

    [ https://issues.apache.org/jira/browse/PIG-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878294#action_12878294

Ashutosh Chauhan commented on PIG-1448:

Problem here is not as bad as it may sound. All the physical operator already detaches the
input tuple after they are done with it. In the getNext() phy op first calls processInput()
which first attaches the input tuple and then detaches it at the end. So, physical operators
contained within inner plans will also do that. Problem is when there is a Bin Cond, Pig short
circuits one of the branches of the inner plan, in which case getNext() of the operator is
never called and thus tuple is never detached. Note in these cases, tuple was already attached
by the operator which had this inner plan to all the roots of the plan. So, in this particular
use case tuple got attached but was never detached and thus had the stray reference which
cannot be GC'ed. This still will not be a problem if there is only a single pipeline in mapper
or reducer since the next time new key/value pair is read and is run through pipeline, the
reference will be overwritten and thus tuple which was not detached in previous run can now
be GC'ed. Only if you have Multi Query optimized script the same pipeline may not be run when
the next key/value pair is read in map() or reduce() and then stray reference will not be
overwritten. If all of these conditions are met and if tuple  itself is large or contains
large bags, we may end up with OOME. 

> Detach tuple from inner plans of physical operator 
> ---------------------------------------------------
>                 Key: PIG-1448
>                 URL: https://issues.apache.org/jira/browse/PIG-1448
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.8.0
> This is a follow-up on PIG-1446 which only addresses this general problem for a specific
instance of For Each. In general, all the physical operators which can have inner plans are
vulnerable to this. Few of them include POLocalRearrange, POFilter, POCollectedGroup etc.
 Need to fix all of these.  

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message