From user-return-55954-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Thu Jun 6 14:15:46 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id F10AE18062B for ; Thu, 6 Jun 2019 16:15:45 +0200 (CEST) Received: (qmail 64666 invoked by uid 500); 6 Jun 2019 14:15:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 64650 invoked by uid 99); 6 Jun 2019 14:15:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2019 14:15:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D5B231A45C2 for ; Thu, 6 Jun 2019 14:15:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.8 X-Spam-Level: * X-Spam-Status: No, score=1.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id wVzqGyzFZ3La for ; Thu, 6 Jun 2019 14:15:40 +0000 (UTC) Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CECA65F1ED for ; Thu, 6 Jun 2019 14:15:39 +0000 (UTC) Received: by mail-yb1-f179.google.com with SMTP id d2so978382ybh.8 for ; Thu, 06 Jun 2019 07:15:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=1cFAaNbRP5mi4i9+dU4kq3hY/a/t/30QzOtPAsMWjnY=; b=cX8q0p59EgmIbvl9ehOktmoGAbkzq0zbcfe1MNtXhex/8OxYmlcrmu3KXrB6KrWBJ2 BSk7NlxOuT2V4VaSPJPrTd8OhFwVRseT+sfikhTKNOE06Y/ZAdG8IzIgd0eJf8xTqM/7 Cf+Mf2CCis5P/OJ7a3KPNLKDTR50TJzPu6KEVAM9ymZZ7SwpaQXKTDlNJxwyBF8UQlpo iUACylZBWEO14E/4rclvQUvZ7r4hU/s+eCNH5J2jONpQ1unGqvbRjzTNU0z3pQBYXO36 3V1WhPhEgCBZBRxuFbIbJiHLtMch3r8YFxSYb5R7oIvSw+ihHYQFJvd8OB+rvEy4kGW3 1ZsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=1cFAaNbRP5mi4i9+dU4kq3hY/a/t/30QzOtPAsMWjnY=; b=K+XLFWdqbhWMa8L3Uny+LkDqQKK41ukGzYsymVofXPcze7jJ2A1m3GMoTtamoBTIZg /J/vZEB5Vy7qmLjH5fF0WSvYIGJb6enzJX1WWCQQInoWSmP+VAdbxmsADoISwtShpk6y h8lhvXxlEXpCuxqw6ir+LfJOmhnXGaqs+il5AZba7tX2ZG8FIobXu0WLP90eBmNq4rr0 jafgu36sjsuyOQGqYD04n/vf5NdOS9tR7Gr+MpxWLVLfod8zi2vNG778cEPhzaGSHS9p o8q+9GTMfwOqFmcDcBrNvemtMfcLCBZDSSH5Q1uKIrnWmTZjJBnj5ePCEALI9wQNM8fT XSUA== X-Gm-Message-State: APjAAAW01+XRXeBrrNh/wZ6gVkYzJ3MAbao65/oqleLuzyTww/VGk9PV b95yWkf2OZXezlaNvoClA59D09QhC3x37t/AnfNIKVBZ X-Google-Smtp-Source: APXvYqy9UCg/lj3ee+kdocyfaVoZ8P15PCIIJFm75sSnuxsIXX8n/+eztP4h67UYbMOAIPv+V2z+Bd67v5tQJWm0hD0= X-Received: by 2002:a25:8506:: with SMTP id w6mr21302343ybk.343.1559830532823; Thu, 06 Jun 2019 07:15:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wellington Chevreuil Date: Thu, 6 Jun 2019 15:14:56 +0100 Message-ID: Subject: Re: How does HBase deal with master switch? To: user@hbase.apache.org Content-Type: multipart/alternative; boundary="000000000000d1a00d058aa858f4" --000000000000d1a00d058aa858f4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Zili, Besides what Duo explained previously, just clarifying on some concepts to your previous description: 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper > regarded it as failed. > ZK just knows about sessions and clients, not the type of client connecting to it. Clients open a session in ZK, then keep pinging back ZK periodically, to keep the session alive. In the case of long full GC pauses, the client (RS, in this case), will fail to ping back within the required period. At this point, ZK will *expire *the session. 2) ZooKeeper launched a new RegionServer, and the new one started to serve. > ZK doesn't launch new RS, it doesn't know about RSes, only client sessions. With the session expiration, Master will be notified that an RS is potentially gone, and will start the process explained by Duo. 3) The old RegionServer finished gc and thought itself was still active and > serving. > What really happens here is that once RS is back from GC, it will try ping ZK again for that session, ZK will back it off because the session is already expired, then RS will kill itself. Em qui, 6 de jun de 2019 =C3=A0s 14:58, =E5=BC=A0=E9=93=8E(Duo Zhang) escreveu: > Once a RS is started, it will create its wal directory and start to write > wal into it. And if master thinks a RS is dead, it will rename the wal > directory of the RS and call recover lease on all the wal files under the > directory to make sure that they are all closed. So even after the RS is > back after a long GC, before it kills itself because of the > SessionExpiredException, it can not accept any write requests any more > since its old wal file is closed and the wal directory is also gone so it > can not create new wal files either. > > Of course, you may still read from the dead RS at this moment > so theoretically you could read a stale data, which means HBase can not > guarantee =E2=80=98external consistency=E2=80=99. > > Hope this solves your problem. > > Thanks. > > Zili Chen =E4=BA=8E2019=E5=B9=B46=E6=9C=886=E6=97= =A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=889:38=E5=86=99=E9=81=93=EF=BC=9A > > > Hi, > > > > Recently from the book, ZooKeeper: Distributed Process Coordination, I > find > > a paragraph mentions that, HBase once suffered by > > > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeepe= r > > regarded it as failed. > > 2) ZooKeeper launched a new RegionServer, and the new one started to > serve. > > 3) The old RegionServer finished gc and thought itself was still active > and > > serving. > > > > in Chapter 5 section 5.3. > > > > I'm interested on it and would like to know how HBase community overcam= e > > this issue. > > > > Best, > > tison. > > > --000000000000d1a00d058aa858f4--