hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Fouché (JIRA) <j...@apache.org>
Subject [jira] [Created] (HDFS-4442) Initialization failed for block (...) Invalid volume failure config value: 1
Date Fri, 25 Jan 2013 17:33:13 GMT
Alexandre Fouché created HDFS-4442:
--------------------------------------

             Summary: Initialization failed for block (...) Invalid volume failure  config
value: 1
                 Key: HDFS-4442
                 URL: https://issues.apache.org/jira/browse/HDFS-4442
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
         Environment: Amazon Linux (Centos 6), Cloudera nightly RPMs
            Reporter: Alexandre Fouché


(Note: Some of the message are similar to HDFS-4201)

Just after i created a new HDFS cluster, and this time using Cloudera nightly RPM hadoop-hdfs-datanode-2.0.0+898-1.cdh4.2.0.p0.939.el6.x86_64,
HDFS datanodes were unable to initialize or store anything. It stays alive, but keeps logging
exceptions every few seconds.

It was "Initialization failed for block pool Block pool (...)" "org.apache.hadoop.util.DiskChecker$DiskErrorException:
Invalid volume failure  config value: 1" and then repeatedly "Exception in BPOfferService
for Block pool (...)"

My config was :

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///opt/hadoop/dn1/data</value>
</property>

After a bit of tweaking, it worked once i added a second EBS volume to the node. Yet it does
not explain the initial error. A bug ?

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///opt/hadoop/dn1/data,file:///opt/hadoop/dn2/data</value>
</property>


Original exceptions:

(...)
2013-01-25 15:04:28,573 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block
pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
2013-01-25 15:04:28,581 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
2013-01-25 15:04:28,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1342054845-10.118.50.25-1359125000145
(storage id DS-404982471-10.194.189.193-50010-1359126268221) service to namenode2.somedomain.com/10.2.118.169:8020
beginning handshake with NN
2013-01-25 15:04:28,605 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode1.somedomain.com/10.118.50.25:8020
org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure  config value:
1
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:182)
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:910)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:872)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
    at java.lang.Thread.run(Unknown Source)
2013-01-25 15:04:28,702 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block
pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
2013-01-25 15:04:28,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
2013-01-25 15:04:28,864 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
BPOfferService for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020
java.lang.NullPointerException
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
    at java.lang.Thread.run(Unknown Source)
2013-01-25 15:04:28,864 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block
pool service for: Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode1.somedomain.com/10.118.50.25:8020
2013-01-25 15:04:33,864 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
2013-01-25 15:04:33,864 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
BPOfferService for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-404982471-10.194.189.193-50010-1359126268221)
service to namenode2.somedomain.com/10.2.118.169:8020
java.lang.NullPointerException
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
    at java.lang.Thread.run(Unknown Source)
(...)


After adding the second volume (and deleted the first one too):

(...)
2013-01-25 17:29:18,394 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/hadoop/dn1/data/in_use.lock
acquired by nodename 22999@datanode1.somedomain.com
2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn1/data is not formatted
2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/hadoop/dn2/data/in_use.lock
acquired by nodename 22999@datanode1.somedomain.com
2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn2/data is not formatted
2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2013-01-25 17:29:19,509 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled
2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145 is not formatted.
2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block
pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled
2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145 is not formatted.
2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block
pool BP-1342054845-10.118.50.25-1359125000145 directory /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145/current
2013-01-25 17:29:19,527 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
2013-01-25 17:29:19,548 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /opt/hadoop/dn1/data/current
2013-01-25 17:29:19,640 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /opt/hadoop/dn2/data/current
2013-01-25 17:29:19,645 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Registered FSDatasetState MBean
2013-01-25 17:29:19,655 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic
Directory Tree Verification scan starting at 1359139242655 with interval 21600000
2013-01-25 17:29:19,656 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding block pool BP-1342054845-10.118.50.25-1359125000145
2013-01-25 17:29:19,674 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1342054845-10.118.50.25-1359125000145
(storage id DS-26684760-10.194.189.193-50010-1359134958661) service to namenode1.somedomain.com/10.118.50.25:8020
beginning handshake with NN
2013-01-25 17:29:19,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1342054845-10.118.50.25-1359125000145
(storage id DS-26684760-10.194.189.193-50010-1359134958661) service to namenode2.somedomain.com/10.2.118.169:8020
beginning handshake with NN
2013-01-25 17:29:19,688 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block
pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020 successfully registered with NN
2013-01-25 17:29:19,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode1.somedomain.com/10.118.50.25:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block
pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
namenode2.somedomain.com/10.2.118.169:8020 using DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block
pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020 trying to claim ACTIVE state with txid=165
2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
ACTIVE Namenode Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id DS-26684760-10.194.189.193-50010-1359134958661)
service to namenode1.somedomain.com/10.118.50.25:8020
2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0 blocks took 0 msec to generate and 39 msecs for RPC and NN processing
2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed command:null
2013-01-25 17:29:19,935 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1342054845-10.118.50.25-1359125000145
2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0 blocks took 1 msec to generate and 50 msecs for RPC and NN processing
2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@11082823
2013-01-25 17:29:20,033 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added
bpid=BP-1342054845-10.118.50.25-1359125000145 to blockPoolScannerMap, new size=1
(...)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message