Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E9FF4200C67 for ; Mon, 15 May 2017 21:22:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E8A4F160BCE; Mon, 15 May 2017 19:22:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3A555160BA9 for ; Mon, 15 May 2017 21:22:08 +0200 (CEST) Received: (qmail 4291 invoked by uid 500); 15 May 2017 19:22:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 4271 invoked by uid 99); 15 May 2017 19:22:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 May 2017 19:22:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D7FBAC0663 for ; Mon, 15 May 2017 19:22:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -96.202 X-Spam-Level: X-Spam-Status: No, score=-96.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_TIME=3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ovmMMEDDDakh for ; Mon, 15 May 2017 19:22:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id A305D5F5C6 for ; Mon, 15 May 2017 19:22:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EA494E0D8C for ; Mon, 15 May 2017 19:22:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5820224394 for ; Mon, 15 May 2017 19:22:04 +0000 (UTC) Date: Mon, 15 May 2017 19:22:04 +0000 (UTC) From: "huaxiang sun (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 15 May 2017 19:22:09 -0000 [ https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-18005: --------------------------------- Attachment: HBASE-18005-master-002.patch v2 patch addresses an error in unitest case. The new unittest case tries to use meta replica and run into HBASE-18035, so meta replica usage is temporarily disabled. > read replica: handle the case that region server hosting both primary replica and meta region is down > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-18005 > URL: https://issues.apache.org/jira/browse/HBASE-18005 > Project: HBase > Issue Type: Bug > Reporter: huaxiang sun > Assignee: huaxiang sun > Attachments: HBASE-18005-master-001.patch, HBASE-18005-master-002.patch > > > Identified one corner case in testing that when the region server hosting both primary replica and the meta region is down, the client tries to reload the primary replica location from meta table, it is supposed to clean up only the cached location for specific replicaId, but it clears caches for all replicas. Please see > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813 > Since it takes some time for regions to be reassigned (including meta region), the following may throw exception > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173 > This exception needs to be caught and it needs to get cached location (in this case, the primary replica's location is not available). If there are cached locations for other replicas, it can still go ahead to get stale values from secondary replicas. > With meta replica, it still helps to not clean up the caches for all replicas as the info from primary meta replica is up-to-date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)