hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
Date Thu, 30 Jan 2014 00:10:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886053#comment-13886053
] 

Aaron T. Myers commented on HDFS-5399:
--------------------------------------

I see, so it sounds like the bug is that the NN is not leaving safemode (after startup?) automatically
while it's in the standby state even though it's received sufficient block reports to cause
it to leave safemode. It will then automatically enter the extension period and subsequently
leave safemode only on transition to the active state. Is that correct?

bq. Is it possible that the SBN keeps tailing the editlog while hold the FSN lock, thus the
SafeModeMonitor thread could not get the lock to leave the safemode?

I don't think this is possible. The EditLogTailer only takes the FSN lock when it wakes up
periodically to tail edits.

> Revisit SafeModeException and corresponding retry policies
> ----------------------------------------------------------
>
>                 Key: HDFS-5399
>                 URL: https://issues.apache.org/jira/browse/HDFS-5399
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>
> Currently for NN SafeMode, we have the following corresponding retry policies:
> # In non-HA setup, for certain API call ("create"), the client will retry if the NN is
in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for
a wrapped SafeModeException when retry is enabled.
> # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically,
the SafeModeException is wrapped as a RetriableException in the server side. Client side's
RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291).
> There are several possible issues in the current implementation:
> # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator through
CLI), and the clients may not want to retry on this type of SafeMode.
> # Client may want to retry on other API calls in non-HA setup.
> # We should have a single generic strategy to address the mapping between SafeMode and
retry policy for both HA and non-HA setup. A possible straightforward solution is to always
wrap the SafeModeException in the RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message