hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
Date Thu, 06 Sep 2012 23:53:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450182#comment-13450182
] 

Colin Patrick McCabe commented on HDFS-3540:
--------------------------------------------

bq. Hi Colin, you keep mentioning HDFS-3004 or the recovery mode feature in trunk. However,
we are talking about branch-1 recovery mode here.

The reason why I mentioned HDFS-3004 is because the original design doc contains a good explanation
of why recovery mode should not be enabled in normal operation:

{code}
Why can't we simply do recovery as part of normal NameNode operation?  Well,
recovery may involve destructive changes to the NameNode metadata.  Since the
metadata is corrupt, we will have to use guesswork to get back to a valid
state.
{code}

This issue is the same in both branch-1 and later branches: if you have to guess, you shouldn't
make the process automatic.

bq. The branch-1 recovery mode feature is not yet released. If the new feature has problems,
we should remove it. It is not a point if people already know how to use it. If there are
people using development code, they have to get prepared that the un-released new feature
may be changed or removed.

It would be inconvenient for us to remove RM for branch-1.  I am willing to consider it, but
I just don't think the arguments presented here so far have been convincing.

I think the first thing we need to answer is what is the use case for edit log toleration?
 What are your guidelines for when edit log toleration should be turned on?  This has never
been clear to me.  It seems to me if you wanted to get higher availability, you would be better
off implementing edit log failover in branch-1.

At the very least, it would be nice to have a document explaining who the intended users are
for edit log toleration, why they would use it rather than something else, and what the risks
are.  At that point we could start to consider what the best resolution for this is-- whatever
that may be.
                
> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3540
>                 URL: https://issues.apache.org/jira/browse/HDFS-3540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.2.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the recovery mode
feature in branch-1 is dramatically different from the recovery mode in trunk since the edit
log implementations in these two branch are different.  For example, there is UNCHECKED_REGION_LENGTH
in branch-1 but not in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH
and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further improvement
in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message