hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1037) better memory layout and spill for sorted and distinct bags
Date Tue, 03 Nov 2009 00:00:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772772#action_12772772

Alan Gates commented on PIG-1037:

The difference is much more than switching from dumping one tuple at a time to multiple tuples.
 It is about how spilling is activated.  In the past, spilling was passive; it was done when
the JVM informed us that memory was getting low.  This did not work well as the JVM only checks
memory usage when it garbage collects.  So by the time pig was notified of a low memory condition
it was often too late.  We often ran out of memory while trying to spill.  Now instead, spilling
is active.  Pig sets aside a buffer for a bag to put its tuples in.  For default bags, once
this buffer is full any additional tuples are written to disk.  For sorted or distinct bags,
once the buffer is full it is sorted and dumped to disk, and new records go into the buffer.

This particular patch only adds the change for sorted and distinct bags.  PIG-975 contains
the original patch for default bags.

> better memory layout and spill for sorted and distinct bags
> -----------------------------------------------------------
>                 Key: PIG-1037
>                 URL: https://issues.apache.org/jira/browse/PIG-1037
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Ying He
>             Fix For: 0.6.0
>         Attachments: PIG-1037.patch, PIG-1037.patch2, PIG-1037.patch3

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message