hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Utkarsh Srivastava (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-44) Problem with spilling BigBags
Date Fri, 07 Dec 2007 06:07:43 GMT

     [ https://issues.apache.org/jira/browse/PIG-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Utkarsh Srivastava updated PIG-44:

    Attachment: spilling1.patch

I now *adaptively* make the choice of the number of records to hold in memory. I start with
holding 1000 records in memory. Once a low-memory condition is hit, I aim that the bag should
not become more than 1% of jvm heap size (TARGET_IN_MEMORY_SIZE). When the bag spills to disk,
I measure how many bytes were actually written. If the bytes written were  < TARGET_IN_MEMORY_SIZE,
I accordingly increase the number of records to hold in memory, otherwise accordingly decrease

Most of the patch is some refactoring that I did in the big data bag unit test. I tested with
up to 10 million records, and it seems to work great. My heap size was the default 64M.

> Problem with spilling BigBags
> -----------------------------
>                 Key: PIG-44
>                 URL: https://issues.apache.org/jira/browse/PIG-44
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Olga Natkovich
>         Attachments: spilling.patch, spilling1.patch
> Currently, once we spill the bag, if no additional memory becomes available, we would
be spilling 1 record at a time because of the problem with the logic. Short term, we will
make a change to spill 100 records at a time. Longer term, we need to try and drain the memory
before doing so.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message