hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
Date Fri, 13 May 2011 22:18:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033346#comment-13033346
] 

Todd Lipcon commented on HDFS-1332:
-----------------------------------

Hey Nicholas. How do you feel about the following compromise:
- For the simple case that there are no datanodes in the cluster, we include some additional
detail in the exception message indicating as much. This will help the common case of a new
user whose datanodes failed to start and is confused why he can't write blocks. This should
be in the IOException itself so that it propagates to the client.
- if debug is enabled, we construct the HashMap as above, and log the "failure to allocate
block" type messages at WARN level
- if debug is not enabled, then we log a message that says something like "failure to allocate
block ... For more information, please enable DEBUG level logging on the o.a.h.BlockPlacementPolicyDefault
logger."

This should avoid any performance impact, but also point users down the right path to solving
the issues.

> When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1332
>                 URL: https://issues.apache.org/jira/browse/HDFS-1332
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1332.patch
>
>
> Whenever the block placement policy determines that a node is not a "good target" it
could add the reason for exclusion to a list, and then when we log "Not able to place enough
replicas" we could say why each node was refused. This would help new users who are having
issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right
now it's very difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message