Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AA12E200CC8 for ; Fri, 14 Jul 2017 23:36:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A896C16E81F; Fri, 14 Jul 2017 21:36:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EEDC116E812 for ; Fri, 14 Jul 2017 23:36:05 +0200 (CEST) Received: (qmail 41649 invoked by uid 500); 14 Jul 2017 21:36:04 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 41528 invoked by uid 99); 14 Jul 2017 21:36:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jul 2017 21:36:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B15AFC0096 for ; Fri, 14 Jul 2017 21:36:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id bi1IOqkoerSp for ; Fri, 14 Jul 2017 21:36:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E4BA45FB61 for ; Fri, 14 Jul 2017 21:36:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2D93AE0D51 for ; Fri, 14 Jul 2017 21:36:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 690B524765 for ; Fri, 14 Jul 2017 21:36:00 +0000 (UTC) Date: Fri, 14 Jul 2017 21:36:00 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18366) Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 14 Jul 2017 21:36:06 -0000 [ https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088121#comment-16088121 ] stack commented on HBASE-18366: ------------------------------- bq. Initially all RS are at the same version i.e. 3.0.0-SNAPSHOT. HMaster.getRegionServerVersion() returns version 0.0.0 for dead RS (carrying meta)....MoveRegionProcedure to move meta region from RS with version 0.0.0 to one of other RS with latest version. This is good. We have double the procedures working on reassign now. bq. I found that server can be online and dead at the same time! Good one [~uagashe] This is a 'hole'. On the change, it looks good to me. I wonder though how something went into the serverdead list w/o being pulled from the online list. That seems like a backdoor we want to close. I can disable for now but will not resolve this issue. I like pulling the checkIfShouldMoveSystemRegionAsync logic handling into your new procedure, HBASE-18261. Would be good to figure why addition to dead list does not get a server purged from the online list? Because it has not been processed yet by crash procedure? How did it get into dead list then? Thanks [~uagashe] > Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta > --------------------------------------------------------------------------------------------------------- > > Key: HBASE-18366 > URL: https://issues.apache.org/jira/browse/HBASE-18366 > Project: HBase > Issue Type: Bug > Reporter: Umesh Agashe > Assignee: Umesh Agashe > > It worked for a few days after enabling it with HBASE-18278. But started failing after commits: > 6786b2b > 68436c9 > 75d2eca > 50bb045 > df93c13 > It works with one commit before: c5abb6c. Need to see what changed with those commits. > Currently it fails with TableNotFoundException. -- This message was sent by Atlassian JIRA (v6.4.14#64029)