hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-370) TaskTracker startup fails if any mapred.local.dir entries don't exist
Date Wed, 19 Jul 2006 17:47:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-370?page=comments#action_12422206 ] 
Bryan Pendleton commented on HADOOP-370:

Yeah, that sounds like a better approach. I'd be happy to implement that in the patch instead,
modulo a dangling issue:

Should "good dirs" (ie, the new return value for checkLocalDirs) be cached? Implication: after
initialization, no further checking for writability of a directory, and the directory list
can only get smaller during an instance of a daemon. The alternative is, as I'm seeing with
my current patch, a lot of extraneous log output that isn't really valuable.

> TaskTracker startup fails if any mapred.local.dir entries don't exist
> ---------------------------------------------------------------------
>                 Key: HADOOP-370
>                 URL: http://issues.apache.org/jira/browse/HADOOP-370
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: ~30 node cluster, various size/number of disks, CPUs, memory
>            Reporter: Bryan Pendleton
>         Attachments: fix-freespace-tasktracker-failure.txt
> This appears to have been introduced with the "check for enough free space" before startup.
> It's debatable how best to fix this bug. I will submit a patch which ignores directories
for which the DF utility fails. This is letting me continue operation on my cluster (where
the number of drives varies, so there are entries in mapred.local.dir for drives that aren't
on all cluster nodes), but a cleaner solution is probably better. I'd lean towards "check
for existence", and ignore the dir if it doesn't  - but don't depend on DF to fail, since
DF could fail for other reasons without meaning you're out of disk space. I argue that a TaskTracker
should start up if *all* directories that *can be written to* in the list have enough space.
Otherwise, a failed drive per cluster machine means no work ever gets done.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message