hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guocui Mi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
Date Wed, 10 Feb 2016 05:42:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140367#comment-15140367

Guocui Mi commented on HDFS-9787:

>>> this would imply that the non-primary SNN never sends a checkpoint after the
first time?
It is true according to my observation.
I am trying to add unittest to cover the scenario. Another two scenarios triggered in our
1) PrimaryCheckpoint uploading fsimage failure due to ANN not available temporarily.
2) Restart all NNs at same time.

I afraid the proposal you shared can't work.
1) set lastCheckpointTime before following code in doCheckpoint(): no difference between putting
after each loop iteration.
2) after following code in doCheckpoint() :  Non-primary SNN will do checkpoint one by one
continuously since lastCheckpointTime not get updated.
if(!sendCheckpoint){      return;    }

> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
> -------------------------------------------------------------------------------
>                 Key: HDFS-9787
>                 URL: https://issues.apache.org/jira/browse/HDFS-9787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0
>            Reporter: Guocui Mi
>            Assignee: Guocui Mi
>         Attachments: HDFS-9786-v000.patch
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. 
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();
>             doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving secsSinceLast
(~checkpointPeriod) >= checkpointConf.getQuietPeriod() (checkpointPeriod * this.quietMultiplier(default
value 1.5)) always returns false.

This message was sent by Atlassian JIRA

View raw message