pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-176) pig creates many small files when it spills
Date Wed, 09 Apr 2008 14:30:28 GMT

     [ https://issues.apache.org/jira/browse/PIG-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pi Song updated PIG-176:
------------------------

    Attachment: pig_176_smallbags_v1.patch

This patch implements (1) Spill file size threshold  (2)My idea in the last comment

"spill.size.threshold" and "spill.gc.activation.size" are to be set as JVM parameters or .pigrc
in order to use this new feature. Default values are 0 and Long.MAX_VALUE respectively.

There is a bit of problem in (1) that Bag.getMemorySize() sometimes doesn't return accurate
value so even the threshold is set, it's still possible that files smaller than the threshold
are created.

The configuration code is still messy in MapReduceLauncher. This needs a clean-up after the
configuration patch gets in.

> pig creates many small files when it spills
> -------------------------------------------
>
>                 Key: PIG-176
>                 URL: https://issues.apache.org/jira/browse/PIG-176
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: pig_176_smallbags_v1.patch
>
>
> Currently, on spill pig can generate millions of small (under 128K) files. Partially
this is due to PIG-170 but even with that patch, you can still try and spill small bags.
> The proposal is to not spill small files. Alan told me that the logic is already there
but we just need to bump the size limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message