From dev-return-73051-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Tue Mar 12 13:04:49 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3784E180600 for ; Tue, 12 Mar 2019 14:04:49 +0100 (CET) Received: (qmail 1590 invoked by uid 500); 12 Mar 2019 13:04:48 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 1576 invoked by uid 99); 12 Mar 2019 13:04:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Mar 2019 13:04:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E0C00C674E for ; Tue, 12 Mar 2019 13:04:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id p0c8DtoaFVri for ; Tue, 12 Mar 2019 13:04:45 +0000 (UTC) Received: from mail-yw1-f43.google.com (mail-yw1-f43.google.com [209.85.161.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B46065F327 for ; Tue, 12 Mar 2019 13:04:44 +0000 (UTC) Received: by mail-yw1-f43.google.com with SMTP id r188so1949142ywb.12 for ; Tue, 12 Mar 2019 06:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fpHayQpVzLQRNkIPN98bO1pLvbOvY+zMx2keaoQJeBI=; b=rZOrNHxAHj6D+6yb7NesevlXd1QATLzLsO7GKTz08MraDRHrcXtUGfl372ITs7UHE2 UhRUNqqBKdRkrUL8zb36Giaw79ls9rPCySaWAiG0ug7r+VjVVnKT5PRrrI1xbgX2EiZ8 ASNkOO+dxYUBOkorUwbEBWcnha9vfqzNpl2e3jAUphd/0PrNmq6AYmPQhzHz4jw882B0 96evcRS7HP7oCrtESFb1c+KjycRw0IISfWD14VJyTvza7b+UepRUuvQMebPfqmIxAM9w qq8ZgCF17Ee+vRh4GJKmkhlfkE12VaiaAFK+8cwg6VzBWWqdOVkZKmXbvUG/1SXUwvRu v2cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fpHayQpVzLQRNkIPN98bO1pLvbOvY+zMx2keaoQJeBI=; b=S7d7WVc4y0qe1yBm1coy+PLGAGc0wzc+oaXsFA27lukTo8Y3Pr/5oZVpYor2wiJSz+ 3RsaPzd/f11UChgbmKE7AihyBpTlwOWPw+MceTpKsAxaxc0+lkFcky1M9hGLCW667smF CdduFgwBjfm2YMvxps2mFCmyoZ6hwD/S7W/CuNh+MrjflJddepMjddWJu1IcIl6xd/5L CbaB1+GQ33jiZ3wKncIRXk7Q33rSxTs1CDPN01hJR9hndqaXFDwZEIILvfJeFv8yzCOb Dnk4s9vJ966lZHOjyvHT5m9KhdrUTq8ba6SkW8q1d7MqkqcUpLdivh0E5TF+H0wbhjkB yfYQ== X-Gm-Message-State: APjAAAV44snkUcjy7xToJi+RzH6EEaZIu0ZCqiZYgO/1wWQ6jfBdTjoA RI+i5YcoVQpTUUmHLpGU4rI7t2TdsVG43X0mkFg= X-Google-Smtp-Source: APXvYqxQTa1yiIIDHgWfc5iZl6h+Jl9T59Gig7HWr9dGI6TSN/q7irlIybE4gWFbWK3wpwkm3p/uogyR9r+2gc+URVA= X-Received: by 2002:a25:be8e:: with SMTP id i14mr25677959ybk.198.1552395876856; Tue, 12 Mar 2019 06:04:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wellington Chevreuil Date: Tue, 12 Mar 2019 13:04:00 +0000 Message-ID: Subject: Re: To: Asim Zafir Cc: dev@hbase.apache.org Content-Type: multipart/alternative; boundary="000000000000ca7bd00583e55472" --000000000000ca7bd00583e55472 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Slow replication can lead to too many znodes in ZK. These would not be direct children of "/hbase/replication/rs" znode, but of specific RSes replication queue. The "108" showed on your "stat" command output is only the number of RSes, and would not vary, but further znodes under each of those would. To have a better picture of how large is your znode tree, you should rather use ZK org.apache.zookeeper.server.SnapshotFormatter tool. This will print the whole structure, from a given snapshot file as input. Also, there may be some further hints about which znode is exceeding the buffer limits by looking at the full stack trace from the jute buffer error. Would you be able to share those? Em ter, 12 de mar de 2019 =C3=A0s 09:52, Asim Zafir escreveu: > > Hi Weillington, > > Thanks for the response. I greatly appreciate. Yes we have replication > enabled and if I stat /hbase/replication/rs I get the following > > > cZxid =3D 0x1000001c8 > > ctime =3D Tue Jun 19 22:23:17 UTC 2018 > > mZxid =3D 0x1000001c8 > > mtime =3D Tue Jun 19 22:23:17 UTC 2018 > > pZxid =3D 0x10000cd555 > > cversion =3D 8050 > > dataVersion =3D 0 > > aclVersion =3D 0 > > ephemeralOwner =3D 0x0 > > dataLength =3D 0 > > numChildren =3D 108 > > how should I be able to see analyze the znode utilzation in this case and > more specifiically how is it impacting the jute-buffer size.. I can see > numChild nodes under /hbase/replication is 108 but how does it correspond > to zk jute buffer reaching max value? > > Also it is not clear the timestamp on this znodes isnt' incrementing. I > see the timestamp is still showing 2018 date. > > Thanks, > asim > > > -------------->>>>> > > > > This jute buffer len error generally means a given znode being watched/re= ad > had grown too large to fit into the buffer. It's not specific to number o= f > watches attached, but amount of info stored in it, for example, too many > children znode under a given znode. In order to understand what's behind > the error, you should analyse your zookeeper znodes tree, you may have a > hint by looking at zookeeper snapashot files. Would you have replication > enabled on this cluster? A common cause for such errors in hbase is when > replication is slow/stuck, and source cluster is under heavy write load, > causing replication queue to grow much faster than it's ability to drain, > which will imply on many znodes created under "replication" znode. > > --000000000000ca7bd00583e55472--