hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
Date Thu, 11 Feb 2016 04:39:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142240#comment-15142240

Vinayakumar B commented on HDFS-9787:

{quote}The original solution was an attempting to catch the case where we don't flood the
NN with checkpoint requests. Instead, maybe the better solution would be to do a small RPC
to see when the latest image was uploaded. If it was uploaded the quietMultiplier beyond the
checkpoint period, then we attempt to upload the checkpoint.
Its a bit more work, but I think this more clearly lays out the intentions in the code, rather
than obtaining the same effect, but without the overhead of actually sending the checkpoint
along each time we want to find out if its behind.{quote}
Yes, thats required to optimize the current approach. But I feel could be done in follow-up

First lets fix the current bug. Agree?

So, I see that patch fixes the issue mentioned in this Jira.
+1 for the patch, 

> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
> -------------------------------------------------------------------------------
>                 Key: HDFS-9787
>                 URL: https://issues.apache.org/jira/browse/HDFS-9787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0
>            Reporter: Guocui Mi
>            Assignee: Guocui Mi
>         Attachments: HDFS-9786-v000.patch
> SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. 
> Here is the logic to check if upload FSImage or not.
> In StandbyCheckpointer.java
> boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod();
>             doCheckpoint(sendRequest);
> The sendRequest is always false if isPrimaryCheckPointer is false giving secsSinceLast
(~checkpointPeriod) >= checkpointConf.getQuietPeriod() (checkpointPeriod * this.quietMultiplier(default
value 1.5)) always returns false.

This message was sent by Atlassian JIRA

View raw message