From hdfs-issues-return-225195-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon Jul 2 10:00:11 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A2EE3180626 for ; Mon, 2 Jul 2018 10:00:10 +0200 (CEST) Received: (qmail 74032 invoked by uid 500); 2 Jul 2018 08:00:09 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 74021 invoked by uid 99); 2 Jul 2018 08:00:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2018 08:00:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2113D1A44DB for ; Mon, 2 Jul 2018 08:00:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.301 X-Spam-Level: X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CHH5BsZmOmSZ for ; Mon, 2 Jul 2018 08:00:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E8F745FB5C for ; Mon, 2 Jul 2018 08:00:07 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6AFEDE1068 for ; Mon, 2 Jul 2018 08:00:07 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A18621841 for ; Mon, 2 Jul 2018 08:00:07 +0000 (UTC) Date: Mon, 2 Jul 2018 08:00:07 +0000 (UTC) From: "Elek, Marton (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529504#comment-16529504 ] Elek, Marton commented on HDDS-199: ----------------------------------- Thanks [~nandakumar131] to check it and for the early feedback. 1. The compilation problem is fixed, the patch is rebased. 2. LeaseManager: thanks to pointed me to the interrupt. Now I understand it. I had a problem when the lease manager was not interrupted, but I can't reproduce it any more. Most probably it was due to an other error. Reverted the LeaseManager to the original state. 3. excludedNodes in ContainerPlacementPolicy: My idea is that in case of container closing we need to adjust the datanode assignments. Even if we have all the 3 replicas we can ask the ReplicationManager to choose 3 datanodes for the closed containers. This will solve all the balancing problems as in case of close, the containers may be copied to the datanodes with more space. In this particular case the List is not an exclusion but just a status report for the current situation which may or may not be changed by the replication manager. But this part is not yet implemented, and I am not sure what will be the signature of the method. As of now I modified the name as you suggested. > Implement ReplicationManager to replicate ClosedContainers > ---------------------------------------------------------- > > Key: HDDS-199 > URL: https://issues.apache.org/jira/browse/HDDS-199 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM > Reporter: Elek, Marton > Assignee: Elek, Marton > Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-199.001.patch, HDDS-199.002.patch > > > HDDS/Ozone supports Open and Closed containers. In case of specific conditions (container is full, node is failed) the container will be closed and will be replicated in a different way. The replication of Open containers are handled with Ratis and PipelineManger. > The ReplicationManager should handle the replication of the ClosedContainers. The replication information will be sent as an event (UnderReplicated/OverReplicated). > The Replication manager will collect all of the events in a priority queue (to replicate first the containers where more replica is missing) calculate the destination datanode (first with a very simple algorithm, later with calculating scatter-width) and send the Copy/Delete container to the datanode (CommandQueue). > A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the copy/delete in case of failure. This is an in-memory structure (based on HDDS-195) which can requeue the underreplicated/overreplicated events to the prioirity queue unless the confirmation of the copy/delete command is arrived. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org