hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit
Date Sat, 09 Apr 2016 07:39:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233428#comment-15233428
] 

Andrew Wang commented on HDFS-10269:
------------------------------------

Like Chris, I don't like the idea of falling back to a default value in the case of misconfiguration
since it leads to ambiguity. The admin has explicitly set it to this value, why should we
ignore it and use some other value? It's much more clear for misconfiguration to be treated
as a fatal error.

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode
exit
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10269
>                 URL: https://issues.apache.org/jira/browse/HDFS-10269
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for dfs.datanode.failed.volumes.tolerated
as 5 from my another cluster but actually the new cluster only have one datadir path. And
this leaded the Invalid volume failure config value and threw {{DiskErrorException}}, so the
datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to
add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 : BlockPoolSliceStorage.recoverTransitionRead:
attempt to load an used block storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for Block pool <registering> (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000.
Exiting.
> java.io.IOException: All specified directories are failed to load.
>         at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for Block pool <registering> (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000.
Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure  config
value: 5
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:281)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending
block pool service for: Block pool <registering> (Datanode Uuid unassigned) service
to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending
block pool service for: Block pool <registering> (Datanode Uuid unassigned) service
to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed
Block pool <registering> (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting
Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value incorrectly. Instead
of, we can give a warn info for this and reset this value to the default value. It will be
a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message