hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails
Date Tue, 08 Mar 2016 03:37:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184333#comment-15184333

Lin Yiqun commented on HDFS-9904:

Thanks [~kihwal] for concrete analysation. I am ignored for that.
Also, it should be set before the namenode is started and should be reset for other test cases.
In method {{testCheckpointCancellationDuringUpload}}, it has already restart all namenodes
after. So I reset the configuration here is ok.
    // don't compress, we want a big image
    for (int i = 0; i < NUM_NNS; i++) {
          DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, false);

    // Throttle SBN upload to make it hang during upload to ANN
    for (int i = 1; i < NUM_NNS; i++) {
          DFSConfigKeys.DFS_IMAGE_TRANSFER_RATE_KEY, 100);
    for (int i = 0; i < NUM_NNS; i++) {
It seems that there was a similar problem in {{testNonPrimarySBNUploadFSImage}}. If first
namenode change to standby, because 10 is bigger than 5(set value), it will also do a checkpoint.
And actually, the checkpoint should be uploaded by one of standby nodes.
doEdits(0, 10);
Am I think right? If so, we can slove both two in this jira. Finally update a patch for addressing
your comments.

> testCheckpointCancellationDuringUpload occasionally fails 
> ----------------------------------------------------------
>                 Key: HDFS-9904
>                 URL: https://issues.apache.org/jira/browse/HDFS-9904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.3
>            Reporter: Kihwal Lee
>         Attachments: HDFS-9904.001.patch
> The failure was at the end of the test case where the txid of the standby (former active)
is checked. Since the checkpoint/uploading was canceled , it is not supposed to have the new
checkpoint. Looking at the test log, that was still the case, but the standby then did checkpoint
on its own and bumped up the txid, right before the check was performed. 

This message was sent by Atlassian JIRA

View raw message