hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4442) Initialization failed for block (...) Invalid volume failure config value: 1
Date Fri, 25 Jan 2013 19:15:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562920#comment-13562920
] 

Harsh J commented on HDFS-4442:
-------------------------------

The error can certainly be improved but what its telling you is that you have 1 disk configured
but you also have {{dfs.datanode.failed.volumes.tolerated}} set to 1. Toleration value of
disk failures should be strictly less than the total number of configured {{dfs.datanode.data.dir}}
values, for a DN cannot run disk-less.

This isn't a bug, but we'd gladly accept a patch to improve the error message. Would you be
interested in submitting one yourself? We have steps outlined for Apache Hadoop contributions
at http://wiki.apache.org/hadoop/HowToContribute, and you can ping the dev lists if you have
further questions in the process.

Like Suresh has already pointed it out to you, in future, make sure to file JIRAs on Apache
JIRA here only if you are sure it is also an Apache Hadoop release issue and not a CDH-specfic
issue. For CDH specific issues, you should file a JIRA on their own bug trackers (link: https://issues.cloudera.org/browse/DISTRO)
or ping its lists for clarification.
                
> Initialization failed for block (...) Invalid volume failure  config value: 1
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-4442
>                 URL: https://issues.apache.org/jira/browse/HDFS-4442
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>         Environment: Amazon Linux (Centos 6), Cloudera nightly RPMs
>            Reporter: Alexandre Fouché
>              Labels: datanode, hdfs
>
> (Note: Some of the message are similar to HDFS-4201)
> Just after i created a new HDFS cluster, and this time using Cloudera nightly RPM hadoop-hdfs-datanode-2.0.0+898-1.cdh4.2.0.p0.939.el6.x86_64,
HDFS datanodes were unable to initialize or store anything. It stays alive, but keeps logging
exceptions every few seconds.
> It was "Initialization failed for block pool Block pool (...)" "org.apache.hadoop.util.DiskChecker$DiskErrorException:
Invalid volume failure  config value: 1" and then repeatedly "Exception in BPOfferService
for Block pool (...)"
> My config was :
> <property>
>  <name>dfs.datanode.data.dir</name>
>  <value>file:///opt/hadoop/dn1/data</value>
> </property>
> After a bit of tweaking, it worked once i added a second EBS volume to the node. Yet
it does not explain the initial error. A bug ?
> <property>
>  <name>dfs.datanode.data.dir</name>
>  <value>file:///opt/hadoop/dn1/data,file:///opt/hadoop/dn2/data</value>
> </property>
> Original exceptions:
> (...)
> 2013-01-25 15:04:28,573 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
block pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 15:04:28,581 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting
up storage: nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 15:04:28,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN
> 2013-01-25 15:04:28,605 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode1.somedomain.com/10.118.50.25:8020
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure  config
value: 1
>     at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:182)
>     at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
>     at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:910)
>     at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:872)
>     at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
>     at java.lang.Thread.run(Unknown Source)
> 2013-01-25 15:04:28,702 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
> 2013-01-25 15:04:28,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 15:04:28,864 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception
in BPOfferService for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020
> java.lang.NullPointerException
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
>     at java.lang.Thread.run(Unknown Source)
> 2013-01-25 15:04:28,864 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending
block pool service for: Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode1.somedomain.com/10.118.50.25:8020
> 2013-01-25 15:04:33,864 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 15:04:33,864 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception
in BPOfferService for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020
> java.lang.NullPointerException
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
>     at java.lang.Thread.run(Unknown Source)
> (...)
> After adding the second volume (and deleted the first one too):
> (...)
> 2013-01-25 17:29:18,394 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/hadoop/dn1/data/in_use.lock
acquired by nodename 22999@datanode1.somedomain.com
> 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn1/data is not formatted
> 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
...
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/hadoop/dn2/data/in_use.lock
acquired by nodename 22999@datanode1.somedomain.com
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn2/data is not formatted
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
...
> 2013-01-25 17:29:19,509 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is
disabled
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145 is not formatted.
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
...
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
block pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is
disabled
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145 is not formatted.
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
...
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting
block pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 17:29:19,527 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting
up storage: nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,548 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /opt/hadoop/dn1/data/current
> 2013-01-25 17:29:19,640 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /opt/hadoop/dn2/data/current
> 2013-01-25 17:29:19,645 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Registered FSDatasetState MBean
> 2013-01-25 17:29:19,655 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner:
Periodic Directory Tree Verification scan starting at 1359139242655 with interval 21600000
> 2013-01-25 17:29:19,656 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding block pool BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,674 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020 beginning handshake with NN
> 2013-01-25 17:29:19,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN
> 2013-01-25 17:29:19,688 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020 successfully registered with NN
> 2013-01-25 17:29:19,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode1.somedomain.com/10.118.50.25:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
> 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode
Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020 trying to claim ACTIVE state with txid=165
> 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
ACTIVE Namenode Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020
> 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0 blocks took 0 msec to generate and 39 msecs for RPC and NN processing
> 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block
report, processed command:null
> 2013-01-25 17:29:19,935 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0 blocks took 1 msec to generate and 50 msecs for RPC and NN processing
> 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block
report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@11082823
> 2013-01-25 17:29:20,033 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Added bpid=BP-1342054845-10.118.50.25-1359125000145 to blockPoolScannerMap, new size=1
> (...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message