Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3DD35200C8A for ; Sun, 4 Jun 2017 11:44:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3CA59160BE0; Sun, 4 Jun 2017 09:44:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 83510160BB7 for ; Sun, 4 Jun 2017 11:44:08 +0200 (CEST) Received: (qmail 10329 invoked by uid 500); 4 Jun 2017 09:44:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 10318 invoked by uid 99); 4 Jun 2017 09:44:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jun 2017 09:44:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 31C97C1785 for ; Sun, 4 Jun 2017 09:44:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id fGauXi4_YP6H for ; Sun, 4 Jun 2017 09:44:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id EDAB45FB71 for ; Sun, 4 Jun 2017 09:44:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 05132E092E for ; Sun, 4 Jun 2017 09:44:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 24A952193A for ; Sun, 4 Jun 2017 09:44:04 +0000 (UTC) Date: Sun, 4 Jun 2017 09:44:04 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18132) Low replication should be checked in period in case of datanode rolling upgrade MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 04 Jun 2017 09:44:09 -0000 [ https://issues.apache.org/jira/browse/HBASE-18132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036230#comment-16036230 ] Ted Yu commented on HBASE-18132: -------------------------------- Lgtm > Low replication should be checked in period in case of datanode rolling upgrade > ------------------------------------------------------------------------------- > > Key: HBASE-18132 > URL: https://issues.apache.org/jira/browse/HBASE-18132 > Project: HBase > Issue Type: Bug > Affects Versions: 1.4.0, 1.1.10 > Reporter: Allan Yang > Assignee: Allan Yang > Attachments: HBASE-18132-branch-1.patch, HBASE-18132-branch-1.v2.patch, HBASE-18132-branch-1.v3.patch, HBASE-18132-branch-1.v4.patch, HBASE-18132.patch > > > For now, we just check low replication of WALs when there is a sync operation (HBASE-2234), rolling the log if the replica of the WAL is less than configured. But if the WAL has very little writes or no writes at all, low replication will not be detected and thus no log will be rolled. > That is a problem when rolling updating datanode, all replica of the WAL with no writes will be restarted and lead to the WAL file end up with a abnormal state. Later operation of opening this file will be always failed. > I bring up a patch to check low replication of WALs at a configured period. When rolling updating datanodes, we just make sure the restart interval time between two nodes is bigger than the low replication check time, the WAL will be closed and rolled normally. A UT in the patch will show everything. -- This message was sent by Atlassian JIRA (v6.3.15#6346)