hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9023) When NN is not able to identify DN for replication, reason behind it can be logged
Date Wed, 27 Dec 2017 18:09:03 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HDFS-9023:
----------------------------
    Attachment: HDFS-9023.03.patch

Thanks a lot Surendra.

1 and 2 done. Good idea about making the code simpler.

For #3, the purpose of this logs is to look at the reason behind the placements, and by itself
does not really 'warn' that there is anything wrong with hdfs. Also it does not always necessarily
mean things are wrong - an example would be {{TestDefaultBlockPlacementPolicy#testPlacementWithLocalRackNodesDecommissioned}}.
IMO info logs solve the problem of "when a placement failed, we don't know why from the logs",
without scaring admins.

So kept this as-is in patch 3. Please let me know if you feel otherwise.

> When NN is not able to identify DN for replication, reason behind it can be logged
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-9023
>                 URL: https://issues.apache.org/jira/browse/HDFS-9023
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.7.1
>            Reporter: Surendra Singh Lilhore
>            Assignee: Xiao Chen
>            Priority: Critical
>         Attachments: HDFS-9023.01.patch, HDFS-9023.02.patch, HDFS-9023.03.patch
>
>
> When NN is not able to identify DN for replication, reason behind it can be logged (at
least critical information why DNs not chosen like disk is full). At present it is expected
to enable debug log.
> For example the reason for below error looks like all 7 DNs are busy for data writes.
But at client or NN side no hint is given in the log message.
> {noformat}
> File /tmp/logs/spark/logs/application_1437051383180_0610/xyz-195_26009.tmp could only
be replicated to 0 nodes instead of minReplication (=1).  There are 7 datanode(s) running
and no node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1553)

> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message