Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8A9F1200CA3 for ; Thu, 1 Jun 2017 12:10:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 88E44160BDF; Thu, 1 Jun 2017 10:10:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AAF35160BC4 for ; Thu, 1 Jun 2017 12:10:14 +0200 (CEST) Received: (qmail 10225 invoked by uid 500); 1 Jun 2017 10:10:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 10214 invoked by uid 99); 1 Jun 2017 10:10:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jun 2017 10:10:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4511B1A0068 for ; Thu, 1 Jun 2017 10:10:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id PzRHf4k6nyc9 for ; Thu, 1 Jun 2017 10:10:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E23F95F1B3 for ; Thu, 1 Jun 2017 10:10:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B6E90E0D99 for ; Thu, 1 Jun 2017 10:10:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2943A21B5A for ; Thu, 1 Jun 2017 10:10:04 +0000 (UTC) Date: Thu, 1 Jun 2017 10:10:04 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Jun 2017 10:10:15 -0000 [ https://issues.apache.org/jira/browse/HBASE-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032735#comment-16032735 ] Hadoop QA commented on HBASE-18143: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 2s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 69m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 190m 55s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 291m 33s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870732/HBASE-18143.master.002.patch | | JIRA Issue | HBASE-18143 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1ebf78e2da84 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / db8ce05 | | Default Java | 1.8.0_131 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7034/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/7034/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7034/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7034/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale > -------------------------------------------------------------------------------------------- > > Key: HBASE-18143 > URL: https://issues.apache.org/jira/browse/HBASE-18143 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 2.0.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-18143.master.001.patch, HBASE-18143.master.002.patch > > > Testing on cluster w/ aggressive killing, if Master is killed serially a few times such that is offline a good while, regionservers that want to report a region transition pause too long between retries. > Here is the regionserver reporting failures: > {code} > 1 2017-05-31 20:50:53,840 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#0) after 1008ms delay (Master is coming online...). > 2 2017-05-31 20:50:54,853 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#1) after 2026ms delay (Master is coming online...). > 3 2017-05-31 20:50:56,886 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#2) after 6084ms delay (Master is coming online...). > 4 2017-05-31 20:51:02,976 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#3) after 30588ms delay (Master is coming online...). > 5 2017-05-31 20:51:33,570 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#4) after 308422ms delay (Master is coming online...). > 6 2017-05-31 20:56:41,997 INFO [RS_CLOSE_REGION-ve0542:16020-2] regionserver.HRegionServer: Failed report of region transition server { host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } transition { transition_code: CLOSED region_info { region_id: 1496284931226 table_name { namesp ace: "default" qualifier: "IntegrationTestBigLinkedList" } start_key: "\337\377\377\377\377\377\377\362" end_key: "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } }; retry (#5) after 6171203ms delay (Master is coming online...). > {code} > See how by the time we get to the 5th retry, we are waiting 100 minutes before we'll retry. That is too long. Make retry happen more frequently. Data is offline until the close is successfully reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)