pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5384) OOM while spilling large bag
Date Tue, 19 Mar 2019 20:13:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796477#comment-16796477

Koji Noguchi commented on PIG-5384:

Requirement of Pig keeping the entire bag in memory until the corresponding spill is done
comes from the fact that Pig can continue to run when spilling to a file fails.  (It drops
the spill file and keeps on using the bag in memory.)

Spilling can fail when disks are full but I'm guessing task would eventually fail when that
Spilling can also fail when a user passes a custom List instance that doesn't support clear().
 But for this case, this bag shouldn't be part of spillables in the first place.

So wondering if we can provide an option to fail the task when spilling fails and let Pig
release each Tuple as soon as it writes to a spill file (before closing).

> OOM while spilling large bag 
> -----------------------------
>                 Key: PIG-5384
>                 URL: https://issues.apache.org/jira/browse/PIG-5384
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
> One of the common OOM issue in Pig is, Pig hitting OOM while trying to spill a large
bag. Current solutions is to give higher heapsize or tweak 
> {noformat}
> pig.spill.memory.usage.threshold.fraction
> pig.spill.collection.threshold.fraction
> pig.spill.unused.memory.threshold.size
> {noformat}
> and make sure spilling starts early enough.  These params are still critical but wondering
if any improvement can be made to increase the chances of avoiding OOM while spilling a single
large bag.

This message was sent by Atlassian JIRA

View raw message