hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM
Date Fri, 27 Jul 2018 08:09:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559403#comment-16559403

Elek, Marton commented on HDDS-245:

Thank you very much [~xyao] the review:

Line 93-95: should we consolidate this inside node2ContainerMap.processReport()?
After second thoughts, I think a better place might be updating the DN2ContainerMap after
we emit the replication requestat line 108.

emitReplicationRequestEvent is async operation. I would like to be sure that during the event
processing the node2ContainerMap is up-to-date (as the ReplicationManager may use information
from the node2ContainerMap). That's the reason why I moved this line before the event emitting.

updateDatanodeMap and processReport are two different methods with different responsibilities
and they could be tested in this level (processReport has a lot of low level unit tests).
If you think we need one method I can create a third one which calls both of them.

Line 134/140: we need to check if currentSet is null before invoke removeAll to avoid NPE
or use the computeIfPresent?

Fix me If I am wrong, but newContainers and missingContainers never could be null as initialized
here. currentSet can not be null as the existence of the key is checked in L124 (isKnownDatanode)
and containers can not be null as we have a precondition check at L122.
I would be happy to add more checks but I don't know which one is reasonable...

Node2ContainerMapTest.java should be renamed to TestNode2ContainerMap.java

It should be included in the current test. (I renamed the tests and realized that some of
them are failing which are fixed in the patch.) But I had patches in wrong format earlies,
let me know it if can't be applied with the rename.

diff --git a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
similarity index 91%
rename from hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
rename to hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
index 79f1b40db03..4796f4db85a 100644
--- a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
+++ b/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
@@ -38,7 +38,7 @@

Line 127:130: NIT: unused can be removed.

Sure, removed.

Line 234: this can be removed, the handler has already been added on line 228.

Good catch, removed.

Line 61: I did not follow this. Can you elaborate on this or add some TODO to fix it later?

The idea was to enable the replication with an async event. But you are right, it's strange
that we can turn it on by even but can't turn it off. I modified the EventHandler<Void>
to EventHandler<Boolean>, now it could be 
turned off and on with event where the payload is just a boolean.

[~ljain]: thank you your review as well.

ReportResult:58,59 - we can keep the missingContainers and newContainers as null.

Sometimes (eg. Node2ContainerMap L152) only the newContainers are added. I would like to be
sure that the sets are not null. Do you think it's a problem because a GC pressure? I think
(hope?) it's negligible as we create a lot of hash sets in Node2ContainerMap anyway and the
very short term memory objects are handler very well (AFAIK).

2. ContainerMapping#getContainerWithPipeline needs to be updated for closed container case.
For closed containers we need to fetch the datanodes from ContainerStateMap and return the
appropriate pipeline information.

Fix me, If I am wrong, I am trying to rephrase: You say that the SCMClientProtocolServer.getContainerWithPipeline
won't work until we return the pipeline information based on ContainerStateMap.contReplicaMap.
And (I guess) without that we can't read closed containers.
[~anu]: Do we have separated issue for that?

3. START_REPLICATION is currently not fired by any publisher. I guess it will be part of another

Yes, the event should be sent when we are leaving the chill mode. [~anu]: Do I need to create
a separated jira or will be handled by HDDS-2?

4. We are currently processing the report as soon it is received. Are we handling the case
when a container is added in one DN and has been removed from another DN? In such a case we
might be sending out a false replicate event as replication count would still match the replication

Could you please give more info for this example? If container is added _and_ removed (and
both are finished) the replication factor should be ok. In case of in-progress replications
the ReplicationManager (which listens to this events) will recalculate the replication number
where the in-progress copies are also considered. So the calculation here is just a draft
to sort the replication requests by priority, the ReplicationManager will recalculate the
required replicas.

> Handle ContainerReports in the SCM
> ----------------------------------
>                 Key: HDDS-245
>                 URL: https://issues.apache.org/jira/browse/HDDS-245
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>         Attachments: HDDS-245.001.patch, HDDS-245.002.patch, HDDS-245.003.patch
> HDDS-242 provides a new class ContainerReportHandler which could handle the ContainerReports
from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the datanodes
to copy the datanode.
> To wire all these components, we need to put implementation to the ContainerReportHandler
(created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode events,
update the containerStateMap and node2ContainerMap and calculate the missing/duplicate containers
and send the ReplicateCommand to the ReplicateManager.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message