From dev-return-73051-archive-asf-public=cust-asf.ponee.io@hbase.apache.org  Tue Mar 12 13:04:49 2019
Return-Path: <dev-return-73051-archive-asf-public=cust-asf.ponee.io@hbase.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 3784E180600
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 12 Mar 2019 14:04:49 +0100 (CET)
Received: (qmail 1590 invoked by uid 500); 12 Mar 2019 13:04:48 -0000
Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@hbase.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@hbase.apache.org>
List-Post: <mailto:dev@hbase.apache.org>
List-Id: <dev.hbase.apache.org>
Reply-To: dev@hbase.apache.org
Delivered-To: mailing list dev@hbase.apache.org
Received: (qmail 1576 invoked by uid 99); 12 Mar 2019 13:04:47 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Mar 2019 13:04:47 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E0C00C674E
	for <dev@hbase.apache.org>; Tue, 12 Mar 2019 13:04:46 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.798
X-Spam-Level: *
X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id p0c8DtoaFVri for <dev@hbase.apache.org>;
	Tue, 12 Mar 2019 13:04:45 +0000 (UTC)
Received: from mail-yw1-f43.google.com (mail-yw1-f43.google.com [209.85.161.43])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B46065F327
	for <dev@hbase.apache.org>; Tue, 12 Mar 2019 13:04:44 +0000 (UTC)
Received: by mail-yw1-f43.google.com with SMTP id r188so1949142ywb.12
        for <dev@hbase.apache.org>; Tue, 12 Mar 2019 06:04:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=fpHayQpVzLQRNkIPN98bO1pLvbOvY+zMx2keaoQJeBI=;
        b=rZOrNHxAHj6D+6yb7NesevlXd1QATLzLsO7GKTz08MraDRHrcXtUGfl372ITs7UHE2
         UhRUNqqBKdRkrUL8zb36Giaw79ls9rPCySaWAiG0ug7r+VjVVnKT5PRrrI1xbgX2EiZ8
         ASNkOO+dxYUBOkorUwbEBWcnha9vfqzNpl2e3jAUphd/0PrNmq6AYmPQhzHz4jw882B0
         96evcRS7HP7oCrtESFb1c+KjycRw0IISfWD14VJyTvza7b+UepRUuvQMebPfqmIxAM9w
         qq8ZgCF17Ee+vRh4GJKmkhlfkE12VaiaAFK+8cwg6VzBWWqdOVkZKmXbvUG/1SXUwvRu
         v2cQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=fpHayQpVzLQRNkIPN98bO1pLvbOvY+zMx2keaoQJeBI=;
        b=S7d7WVc4y0qe1yBm1coy+PLGAGc0wzc+oaXsFA27lukTo8Y3Pr/5oZVpYor2wiJSz+
         3RsaPzd/f11UChgbmKE7AihyBpTlwOWPw+MceTpKsAxaxc0+lkFcky1M9hGLCW667smF
         CdduFgwBjfm2YMvxps2mFCmyoZ6hwD/S7W/CuNh+MrjflJddepMjddWJu1IcIl6xd/5L
         CbaB1+GQ33jiZ3wKncIRXk7Q33rSxTs1CDPN01hJR9hndqaXFDwZEIILvfJeFv8yzCOb
         Dnk4s9vJ966lZHOjyvHT5m9KhdrUTq8ba6SkW8q1d7MqkqcUpLdivh0E5TF+H0wbhjkB
         yfYQ==
X-Gm-Message-State: APjAAAV44snkUcjy7xToJi+RzH6EEaZIu0ZCqiZYgO/1wWQ6jfBdTjoA
	RI+i5YcoVQpTUUmHLpGU4rI7t2TdsVG43X0mkFg=
X-Google-Smtp-Source: APXvYqxQTa1yiIIDHgWfc5iZl6h+Jl9T59Gig7HWr9dGI6TSN/q7irlIybE4gWFbWK3wpwkm3p/uogyR9r+2gc+URVA=
X-Received: by 2002:a25:be8e:: with SMTP id i14mr25677959ybk.198.1552395876856;
 Tue, 12 Mar 2019 06:04:36 -0700 (PDT)
MIME-Version: 1.0
References: <CAEh7H3-Z10fCO=JW=5h193DmJTP3nQUg3nrRCyJFhgUDdwAn_w@mail.gmail.com>
In-Reply-To: <CAEh7H3-Z10fCO=JW=5h193DmJTP3nQUg3nrRCyJFhgUDdwAn_w@mail.gmail.com>
From: Wellington Chevreuil <wellington.chevreuil@gmail.com>
Date: Tue, 12 Mar 2019 13:04:00 +0000
Message-ID: <CAGScPGvjzNTf7nMCVrGwXWm=hiDUq=Wig6oR84220vAY0NJJBg@mail.gmail.com>
Subject: Re:
To: Asim Zafir <asim.zafir@gmail.com>
Cc: dev@hbase.apache.org
Content-Type: multipart/alternative; boundary="000000000000ca7bd00583e55472"

--000000000000ca7bd00583e55472
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Slow replication can lead to too many znodes in ZK. These would not be
direct children of "/hbase/replication/rs" znode, but of specific RSes
replication queue. The "108" showed on your "stat" command output is only
the number of RSes, and would not vary, but further znodes under each of
those would. To have a better picture of how large is your znode tree, you
should rather use ZK org.apache.zookeeper.server.SnapshotFormatter tool.
This will print the whole structure, from a given snapshot file as input.
Also, there may be some further hints about which znode is exceeding the
buffer limits by looking at the full stack trace from the jute buffer
error. Would you be able to share those?

Em ter, 12 de mar de 2019 =C3=A0s 09:52, Asim Zafir <asim.zafir@gmail.com>
escreveu:

>
> Hi Weillington,
>
> Thanks for the response. I greatly appreciate. Yes we have replication
> enabled and if I stat /hbase/replication/rs  I get the following
>
>
> cZxid =3D 0x1000001c8
>
> ctime =3D Tue Jun 19 22:23:17 UTC 2018
>
> mZxid =3D 0x1000001c8
>
> mtime =3D Tue Jun 19 22:23:17 UTC 2018
>
> pZxid =3D 0x10000cd555
>
> cversion =3D 8050
>
> dataVersion =3D 0
>
> aclVersion =3D 0
>
> ephemeralOwner =3D 0x0
>
> dataLength =3D 0
>
> numChildren =3D 108
>
> how should I be able to see analyze the znode utilzation in this case and
> more specifiically how is it impacting the jute-buffer size.. I can see
> numChild nodes under /hbase/replication is 108 but how does it correspond
> to zk jute buffer reaching max value?
>
> Also it is not clear the timestamp on this znodes isnt' incrementing. I
> see the timestamp is still showing 2018 date.
>
> Thanks,
> asim
>
>
> -------------->>>>>
>
>
>
> This jute buffer len error generally means a given znode being watched/re=
ad
> had grown too large to fit into the buffer. It's not specific to number o=
f
> watches attached, but amount of info stored in it, for example, too many
> children znode under a given znode. In order to understand what's behind
> the error, you should analyse your zookeeper znodes tree, you may have a
> hint by looking at zookeeper snapashot files. Would you have replication
> enabled on this cluster? A common cause for such errors in hbase is when
> replication is slow/stuck, and source cluster is under heavy write load,
> causing replication queue to grow much faster than it's ability to drain,
> which will imply on many znodes created under "replication" znode.
>
>

--000000000000ca7bd00583e55472--