hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-176) pig creates many small files when it spills
Date Thu, 03 Apr 2008 12:00:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585069#action_12585069
] 

Pi Song commented on PIG-176:
-----------------------------

So let's say if the size is smaller than something, don't spill right? This is very easy to
fix but we will be able to reclaim a bit less memory than before therefore causing some tasks
to fail more often in exchange for some tasks running faster. Is this acceptable?

Probably the best way to go is to make it configurable but Pig-111 isn't in yet. Sighhh.....
I want to have more time.

> pig creates many small files when it spills
> -------------------------------------------
>
>                 Key: PIG-176
>                 URL: https://issues.apache.org/jira/browse/PIG-176
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>
> Currently, on spill pig can generate millions of small (under 128K) files. Partially
this is due to PIG-170 but even with that patch, you can still try and spill small bags.
> The proposal is to not spill small files. Alan told me that the logic is already there
but we just need to bump the size limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message