hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
Date Wed, 10 Feb 2016 01:32:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140175#comment-15140175
] 

Jesse Yates commented on HDFS-9787:
-----------------------------------

Taking a quick look, this would imply that the non-primary SNN never sends a checkpoint after
the first time? A good test to ensure that this is the case is to start the NNs, wait until
there primary SNN is selected and then remove it from the cluster. Are any more checkpoints
sent to the ANN?

My inclination is that you are correct, no (unless it takes a long time to build the checkpoint),
but I'd like to hear if that's actually the case. I think the fix is to just set lastCheckpointTime
in doCheckpoint() rather than after each loop iteration.

> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-9787
>                 URL: https://issues.apache.org/jira/browse/HDFS-9787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0
>            Reporter: Guocui Mi
>            Assignee: Guocui Mi
>         Attachments: HDFS-9786-v000.patch
>
>
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. 
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();
>             doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving secsSinceLast
(~checkpointPeriod) >= checkpointConf.getQuietPeriod() (checkpointPeriod * this.quietMultiplier(default
value 1.5)) always returns false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message