hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
Date Thu, 21 Nov 2013 23:15:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829452#comment-13829452
] 

Vinay commented on HDFS-2882:
-----------------------------

Hi colin, 
Thanks for taking a look at patch and sorry for confusing you.

Yes.. its able to reproduce easily in only HA installation.
1. Make one of the data directory unwritable
2. Restart the datanode

Here blockpool initialization will fail for first name node connected and that BPSA will exit.
But for second namenode it will not try to initialize block pool. As namespace info was not
null. . 
And it tries to send heartbeats and throws NPEs continously. 

Todd suggested 3 scenarios to be handled in this case. And he proposed an initial patch. 
I just continued the approach. 

> DN continues to start up, even if block pool fails to initialize
> ----------------------------------------------------------------
>
>                 Key: HDFS-2882
>                 URL: https://issues.apache.org/jira/browse/HDFS-2882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Vinay
>         Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch,
HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its drives. I
saw the following:
> 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
>         at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.<init>(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This
was on the HDFS-1623 branch but may affect trunk as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message