hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
Date Thu, 12 May 2011 20:10:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032636#comment-13032636
] 

Todd Lipcon commented on HDFS-1332:
-----------------------------------

Hey Nicholas. I thought about the performance impact as well, but I came to the conlusion
that the node-selection code is not a hot code path. In my experience, the NN spends much
much more time on read operations than on block allocation. For example, on one production
NN whose metrics I have access to, it has performed 3.6M addBlock operations vs 105M FileInfoOps,
30M GetListing ops, 27M GetBlockLocations ops.

Additionally, the new code will only get run for nodes which are decommissioning, out of space,
or highly loaded. Thus it's not likely that it will add any appreciable overhead to most chooseTarget
operations.

Looking at the existing code, it's hardly optimized at all. For example, each invocation of
chooseRandom() invokes countNumOfAvailableNodes which takes and releases locks, computes String
substrings, etc.



> When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1332
>                 URL: https://issues.apache.org/jira/browse/HDFS-1332
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1332.patch
>
>
> Whenever the block placement policy determines that a node is not a "good target" it
could add the reason for exclusion to a list, and then when we log "Not able to place enough
replicas" we could say why each node was refused. This would help new users who are having
issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right
now it's very difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message