hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers
Date Mon, 02 Jul 2018 08:00:07 GMT

    [ https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529504#comment-16529504

Elek, Marton commented on HDDS-199:

Thanks [~nandakumar131] to check it and for the early feedback.

1. The compilation problem is fixed, the patch is rebased.

2. LeaseManager: thanks to pointed me to the interrupt. Now I understand it. I had a problem
when the lease manager was not interrupted, but I can't reproduce it any more. Most probably
it was due to an other error. Reverted the LeaseManager to the original state.
3.  excludedNodes in ContainerPlacementPolicy: My idea is that in case of container closing
we need to adjust the datanode assignments. Even if we have all the 3 replicas we can ask
the ReplicationManager to choose 3 datanodes for the closed containers. This will solve all
the balancing problems as in case of close, the containers may be copied to the datanodes
with more space. 

In this particular case the List<DatanodeDetails> is not an exclusion but just a status
report for the current situation which may or may not be changed by the replication manager.

But this part is not yet implemented, and I am not sure what will be the signature of the
method. As of now I modified the name as you suggested.

> Implement ReplicationManager to replicate ClosedContainers
> ----------------------------------------------------------
>                 Key: HDDS-199
>                 URL: https://issues.apache.org/jira/browse/HDDS-199
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>         Attachments: HDDS-199.001.patch, HDDS-199.002.patch
> HDDS/Ozone supports Open and Closed containers. In case of specific conditions (container
is full, node is failed) the container will be closed and will be replicated in a different
way. The replication of Open containers are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. The replication
information will be sent as an event (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue (to replicate
first the containers where more replica is missing) calculate the destination datanode (first
with a very simple algorithm, later with calculating scatter-width) and send the Copy/Delete
container to the datanode (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the copy/delete
in case of failure. This is an in-memory structure (based on HDDS-195) which can requeue the
underreplicated/overreplicated events to the prioirity queue unless the confirmation of the
copy/delete command is arrived.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message