pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-166) Disk Full
Date Mon, 31 Mar 2008 15:28:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583719#action_12583719

Alan Gates commented on PIG-166:


This is in response to your comments about garbage collected temp files.  Currently, we mark
temp files from bag spills as deleteOnExit.  So unless the JVM crashes or is killed via kill
-9 or similar mechanism, we do clean up our tmp files.  The issue that Amir is addressing
here is when running a single pig job fills up the disks.  

AFAIK, if people's jobs generate so much data that we can't even contain it on the disk then
our only hope is to find a way to better parallelize the problem.  That will not always be
possible.  But as Amir points out, when we do fail we shouldn't take down the node with us.
 I think that's really the focus of this bug.

> Disk Full
> ---------
>                 Key: PIG-166
>                 URL: https://issues.apache.org/jira/browse/PIG-166
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Amir Youssefi
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes Task Tracker
(and other processes) on that node. We need to have a safety net and fail the task before
crashing happens (and more). 
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets stock at
a percentage without returning nodes to cluster. I talked to Hadoop team to explore Max Percentage
idea. Nodes running into this problem get into permanent problems and manual cleaning by administrator
is necessary. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message