hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11965) [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure
Date Fri, 30 Jun 2017 11:43:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069958#comment-16069958

Rakesh R commented on HDFS-11965:

Thank you [~surendrasingh] for patch and test cases looks good. Adding few comments, please
take care.

# I could see all the functions operating over this map is synchronized. So, we don't need
{{AtomicInteger}}, just Integer is enough.
{{private final Map<Long, Integer> lowRedundantFileRetryCount}}
# Instead of {{under-replicated blocks}} terminology, could you please use {{low redundant
{{isUnderReplicated}} can be changed to {{hasLowRedundancyBlocks()}}. Similarly can change
other occurances as well.
# {{int MAX_RETRY_FOR_UNER_REPLICATED_FILE = 10;}}, nit sure whether this is too small. How
about increase retries change it to 50 to give more chances.
# Logging has to be changed reflecting the retry case. Presently, it will say SUCCESS and
will mislead, right?
} else {
	//Check if file is under-replicated or some blocks are not
	//satisfy the policy. If file is under-replicate, SPS will
    //retry for some interval and wait for DN to report the block.
# Would be great if you could add following unit test cases:
	(a) EC unit tests in {{TestStoragePolicySatisfierWithStripedFile}} to cover the low redundant
striped block logic.
	(b) File blocks has extra redundant blocks. Here, the verification point is, SPS should consider
only needed replica count for satisfying storage policy. For example, replication is 3, but
it has extra redundant blocks(2 additional replicas, 3 + 2 = total 5 count). After satisfying
3 replica, SPS can mark as SUCCESS and remove the xattr.

> [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure
> -------------------------------------------------------------------------
>                 Key: HDFS-11965
>                 URL: https://issues.apache.org/jira/browse/HDFS-11965
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-10285
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-11965-HDFS-10285.001.patch, HDFS-11965-HDFS-10285.002.patch
> The test case is failing because all the required replicas are not moved in expected
storage. This is happened because of delay in datanode registration after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and  got registered with NameNode. (one
datanode  not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move one replica
in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this condition
is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
>                 && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
>               blockStorageMovementNeeded
>                   .add(storageMovementAttemptedResult.getTrackId());
>             ....................
>             ......................
>             } else {
>             ....................
>             ......................
>               this.sps.postBlkStorageMovementCleanup(
>                   storageMovementAttemptedResult.getTrackId());
>             }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK replica. Now
Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, timeout,
> This condition never became true and test case will be timed out.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message