hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Mundlapudi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
Date Thu, 19 May 2011 05:24:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035990#comment-13035990
] 

Bharath Mundlapudi commented on HDFS-1592:
------------------------------------------

First, Thank you for identifying this issue, Eli. Great job!

Couple of comments,
1. We did test couple of things like masking permissions still dfs level. That didn't catch
this issue. You pointed in making specific directory permissions helped us to reproduce this
case. Thanks again.
2. We tested by unmounting disks also.
3. Then we tested with injecting failures at kernel level. 

Regarding testcases,
I agree with you that we need more tests, But I think, we should do that in another jira.
Since, we have already spent lot of effort in manual testing. Can we file another Jira to
track this? 

With this new patch, i have tested following new cases. Can you please review and provide
your feedback?

case 1: All four good volumes, Vol Tolerated=1, expected outcome = BPservice should start

11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/0/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/1/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/2/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/3/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: Registered FSDatasetState MBean
11/05/19 04:57:51 INFO datanode.DataNode: Adding block pool BP-1694914230-10.72.86.55-1305704227822
11/05/19 04:57:51 INFO datanode.DirectoryScanner: Periodic Directory Tree Verification scan
starting at 1305782678947 with interval 21600000
11/05/19 04:57:51 INFO datanode.DataNode: in register: sid=DS-340618566-10.72.86.55-50010-1305704313207;SI=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 04:57:51 INFO datanode.DataNode: bpReg after =lv=-35;cid=test;nsid=413952175;c=0;sid=DS-340618566-10.72.86.55-50010-1305704313207;name=127.0.0.1:50010
11/05/19 04:57:51 INFO datanode.DataNode: in register:;bpDNR=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 04:57:51 INFO datanode.DataNode: For namenode localhost/127.0.0.1:8020 using BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
11/05/19 04:57:51 INFO datanode.DataNode: BlockReport of 0 blocks got processed in 3 msecs
11/05/19 04:57:51 INFO datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.DatanodeCommand$Finalize@3e5a91
11/05/19 04:57:51 INFO datanode.BlockPoolSliceScanner: Periodic Block Verification scan initialized
with interval 1814400000.
11/05/19 04:57:51 INFO datanode.DataBlockScanner: Added bpid=BP-1694914230-10.72.86.55-1305704227822
to blockPoolScannerMap, new size=1
11/05/19 04:57:56 INFO datanode.BlockPoolSliceScanner: Starting a new period : work left in
prev period : 0.00%

case 2: One failed volume(/grid/2), three good volumes, Vol Tolerated=1, expected outcome
= BPService should start

11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:01:27 INFO common.Storage: Formatting ...
11/05/19 05:01:27 WARN common.Storage: Invalid directory in: /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822:
File file:/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO datanode.DataNode: setting up storage: nsid=0;bpid=BP-1694914230-10.72.86.55-1305704227822;lv=-35;nsInfo=lv=-35;cid=test;nsid=413952175;c=0;bpid=BP-1694914230-10.72.86.55-1305704227822
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/0/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/1/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/3/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: Registered FSDatasetState MBean
11/05/19 05:01:27 INFO datanode.DataNode: Adding block pool BP-1694914230-10.72.86.55-1305704227822
11/05/19 05:01:27 INFO datanode.DirectoryScanner: Periodic Directory Tree Verification scan
starting at 1305789604425 with interval 21600000
11/05/19 05:01:27 INFO datanode.DataNode: in register: sid=DS-340618566-10.72.86.55-50010-1305704313207;SI=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 05:01:27 INFO datanode.DataNode: bpReg after =lv=-35;cid=test;nsid=413952175;c=0;sid=DS-340618566-10.72.86.55-50010-1305704313207;name=127.0.0.1:50010
11/05/19 05:01:27 INFO datanode.DataNode: in register:;bpDNR=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 05:01:27 INFO datanode.DataNode: For namenode localhost/127.0.0.1:8020 using BLOCKREPORT_INTERVAL
of 21600000msec Initial delay: 0msec; heartBeatInterval=3000
11/05/19 05:01:27 INFO datanode.DataNode: BlockReport of 0 blocks got processed in 4 msecs
11/05/19 05:01:27 INFO datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.DatanodeCommand$Finalize@1adb7b8
11/05/19 05:01:27 INFO datanode.BlockPoolSliceScanner: Periodic Block Verification scan initialized
with interval 1814400000.
11/05/19 05:01:27 INFO datanode.DataBlockScanner: Added bpid=BP-1694914230-10.72.86.55-1305704227822
to blockPoolScannerMap, new size=1
11/05/19 05:01:32 INFO datanode.BlockPoolSliceScanner: Starting a new period : work left in
prev period : 0.00%

case 3: Two failed volumes(/grid/1,/grid/2), two good volumes, Vol Tolerated=1, expected outcome
= BPService should NOT start

11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/1/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:04:06 INFO common.Storage: Formatting ...
11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:04:06 INFO common.Storage: Formatting ...
11/05/19 05:04:06 WARN common.Storage: Invalid directory in: /grid/1/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822:
File file:/grid/1/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 WARN common.Storage: Invalid directory in: /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822:
File file:/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 INFO common.Storage: Locking is disabled
11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/1/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/1/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
does not exist.
11/05/19 05:04:06 INFO common.Storage: Locking is disabled
11/05/19 05:04:06 INFO datanode.DataNode: setting up storage: nsid=0;bpid=BP-1694914230-10.72.86.55-1305704227822;lv=-35;nsInfo=lv=-35;cid=test;nsid=413952175;c=0;bpid=BP-1694914230-10.72.86.55-1305704227822
11/05/19 05:04:06 FATAL datanode.DataNode: DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010,
storageID=DS-340618566-10.72.86.55-50010-1305704313207, infoPort=50075, ipcPort=50020, storageInfo=lv=-35;cid=test;nsid=413952175;c=0)
initialization failed for block pool BP-1694914230-10.72.86.55-1305704227822
org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value for volumes required
- validVolsRequired: 3, Current valid volumes: 2, volsConfigured: 4, volFailuresTolerated:
1
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1160)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.initFsDataSet(DataNode.java:1420)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.access$1100(DataNode.java:169)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBPStorage(DataNode.java:804)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBP(DataNode.java:774)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1191)
	at java.lang.Thread.run(Thread.java:619)
11/05/19 05:04:06 WARN datanode.DataNode: DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010,
storageID=DS-340618566-10.72.86.55-50010-1305704313207, infoPort=50075, ipcPort=50020, storageInfo=lv=-35;cid=test;nsid=413952175;c=0)
ending block pool service for: BP-1694914230-10.72.86.55-1305704227822

case 4: All failed volumes, Vol Tolerated=1, expected outcome = BPService should NOT start

11/05/19 05:07:51 INFO common.Storage: Storage directory /grid/0/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:07:51 INFO common.Storage: Formatting ...
11/05/19 05:07:51 INFO common.Storage: Storage directory /grid/1/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:07:51 INFO common.Storage: Formatting ...
11/05/19 05:07:51 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:07:51 INFO common.Storage: Formatting ...
11/05/19 05:07:51 INFO common.Storage: Storage directory /grid/3/testing/hadoop-logs/dfs/data
is not formatted.
11/05/19 05:07:51 INFO common.Storage: Formatting ...
11/05/19 05:07:51 FATAL datanode.DataNode: DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010,
storageID=, infoPort=50075, ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0) initialization
failed for block pool BP-1694914230-10.72.86.55-1305704227822
java.io.IOException: All specified directories are not accessible or do not exist.
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:182)
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:217)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBPStorage(DataNode.java:797)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBP(DataNode.java:774)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1191)
	at java.lang.Thread.run(Thread.java:619)
11/05/19 05:07:51 WARN datanode.DataNode: DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010,
storageID=, infoPort=50075, ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0) ending block
pool service for: BP-1694914230-10.72.86.55-1305704227822


> Datanode startup doesn't honor volumes.tolerated 
> -------------------------------------------------
>
>                 Key: HDFS-1592
>                 URL: https://issues.apache.org/jira/browse/HDFS-1592
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.204.0
>            Reporter: Bharath Mundlapudi
>            Assignee: Bharath Mundlapudi
>             Fix For: 0.20.204.0, 0.23.0
>
>         Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-rel20.patch
>
>
> Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message