hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories
Date Tue, 07 Sep 2010 04:38:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906669#action_12906669
] 

dhruba borthakur commented on HDFS-1222:
----------------------------------------

I think this bug says that if the NN is configured with multiple fs.name.dirs and there is
a bad fsimage/edits in one of the configured directories while the fsimage/edits in the other
directories are not-corrupted, still the NN fails to load the image.

This feature is somewhat by design. On the other hand, i think there is still a bug in this
"feature". Suppose there were two directories in fs.name.dir, say d1 and d2. The edits in
d2 is corrupted but is of the same size as the edits file in d1. Now, suppose d1 is listed
first in the fs.name.dir configuration parameter. In this case, the NN will try reading the
fsimage/edits from d1, and will succeed with the load. 

> NameNode fail stop in spite of multiple metadata directories
> ------------------------------------------------------------
>
>                 Key: HDFS-1222
>                 URL: https://issues.apache.org/jira/browse/HDFS-1222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message