hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
Date Tue, 18 Mar 2014 21:46:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939848#comment-13939848
] 

Jing Zhao commented on HDFS-6089:
---------------------------------

Thanks for the response, Andrew. 

bq. If we add a time threshold (like the tailer), we want to avoid the reverse problem: a
lot of small segments accumulating in the absence of a standby.
Could you please explain how we avoid this issue with the current strategy?
For the autoroller in ANN, I guess it should still determine whether to roll based on the
# edits, however, we should change its sleeping interval from 5min to a smaller number (e.g.,
2min), which means it will come to check the edits # every 2min and roll edits if necessary.
Can this address your concern? Or am I missing something here?

> Standby NN while transitioning to active throws a connection refused error when the prior
active NN process is suspended
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6089
>                 URL: https://issues.apache.org/jira/browse/HDFS-6089
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Jing Zhao
>         Attachments: HDFS-6089.000.patch, HDFS-6089.001.patch
>
>
> The following scenario was tested:
> * Determine Active NN and suspend the process (kill -19)
> * Wait about 60s to let the standby transition to active
> * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active.
> What was noticed that some times the call to get the service state of nn2 got a socket
time out exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message