hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-176) pig creates many small files when it spills
Date Thu, 03 Apr 2008 16:10:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585163#action_12585163
] 

Olga Natkovich commented on PIG-176:
------------------------------------

Pi,

Running faster is part of it. The other part is not to fill up disks with tiny files which
causes disk frgamentation and also takes forever to cleanup at the end of processing though
you suggestion of cleaning as we go might help that a bit.

> pig creates many small files when it spills
> -------------------------------------------
>
>                 Key: PIG-176
>                 URL: https://issues.apache.org/jira/browse/PIG-176
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>
> Currently, on spill pig can generate millions of small (under 128K) files. Partially
this is due to PIG-170 but even with that patch, you can still try and spill small bags.
> The proposal is to not spill small files. Alan told me that the logic is already there
but we just need to bump the size limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message