hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce
Date Thu, 29 Jul 2010 11:46:17 GMT

    [ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893589#action_12893589
] 

Hadoop QA commented on PIG-1516:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12450778/PIG-1516.2.patch
  against trunk revision 980276.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 402 release audit warnings (more than the
trunk's current 400 warnings).

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/console

This message is automatically generated.

> finalize in bag implementations causes pig to run out of memory in reduce 
> --------------------------------------------------------------------------
>
>                 Key: PIG-1516
>                 URL: https://issues.apache.org/jira/browse/PIG-1516
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1516.2.patch, PIG-1516.patch
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have finalize methods
implemented. As a result, the garbage collector moves them to a finalization queue, and the
memory used is freed only after the finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the finalization
queue, and pig runs out of memory. This can happen if large number of small bags are being
created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that are created
when the bag is too large. But if the bags are small enough, no spill files are created, and
there is no use of the finalize function.
>  A new class that holds a list of files will be introduced (FileList). This class will
have a finalize method that deletes the files. The bags will no longer have finalize methods,
and the bags will use FileList instead of ArrayList<File>.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as there will
not be the stage of combining intermediate merge results. But I would recommend disabling
it only if you have this problem as it is likely to slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message