hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guocui Mi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
Date Wed, 10 Feb 2016 05:42:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140367#comment-15140367
] 

Guocui Mi commented on HDFS-9787:
---------------------------------

>>> this would imply that the non-primary SNN never sends a checkpoint after the
first time?
It is true according to my observation.
I am trying to add unittest to cover the scenario. Another two scenarios triggered in our
cluster:
1) PrimaryCheckpoint uploading fsimage failure due to ANN not available temporarily.
2) Restart all NNs at same time.

I afraid the proposal you shared can't work.
1) set lastCheckpointTime before following code in doCheckpoint(): no difference between putting
after each loop iteration.
2) after following code in doCheckpoint() :  Non-primary SNN will do checkpoint one by one
continuously since lastCheckpointTime not get updated.
if(!sendCheckpoint){      return;    }

> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-9787
>                 URL: https://issues.apache.org/jira/browse/HDFS-9787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0
>            Reporter: Guocui Mi
>            Assignee: Guocui Mi
>         Attachments: HDFS-9786-v000.patch
>
>
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. 
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();
>             doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving secsSinceLast
(~checkpointPeriod) >= checkpointConf.getQuietPeriod() (checkpointPeriod * this.quietMultiplier(default
value 1.5)) always returns false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message