Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 94B0A200B9D for ; Thu, 13 Oct 2016 16:49:36 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 93380160AE4; Thu, 13 Oct 2016 14:49:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B365D160AE3 for ; Thu, 13 Oct 2016 16:49:35 +0200 (CEST) Received: (qmail 97812 invoked by uid 500); 13 Oct 2016 14:49:34 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 97802 invoked by uid 99); 13 Oct 2016 14:49:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 14:49:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 677B71A02CD for ; Thu, 13 Oct 2016 14:49:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.479 X-Spam-Level: ** X-Spam-Status: No, score=2.479 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=minerkasch-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id UZ4BldoCAhgy for ; Thu, 13 Oct 2016 14:49:32 +0000 (UTC) Received: from mail-it0-f41.google.com (mail-it0-f41.google.com [209.85.214.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 257A45FAC3 for ; Thu, 13 Oct 2016 14:49:32 +0000 (UTC) Received: by mail-it0-f41.google.com with SMTP id e203so126044143itc.0 for ; Thu, 13 Oct 2016 07:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=minerkasch-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=lce4RdPps31hIlNPJj/g65WoH2upIbHeo4nt8VR/L7g=; b=H4x3oXaaU9nZJQTeJJP0qi1VyIqKhjXHb4O73oOC8ts9LtQowfJQl1CHYKW8+tKmhY l/RCAWLWRklXQUIbEMIupFscZQq+PAiGcSDKImyyFRpm36QHEGSw2POXLnuNn/w42ziT fHzRaDtPZPMzWAF0U0i85ThOxf96w26mwSv1swpqwMIpvA15DhqiiKXDQXw0BK6DvR/l +tnaawP5vJ9pCBdgrjMP+SkOwfs2zMYvzFz4bzZIjwxhy5vBTsaSXF8esAD+Wb7YG1AE +TiOc4fQy7EcMpEFFSs+1dgrm/xdPWcjNzIu7jMoXF3S3cLujtcCMZcn/iZJ5rdNI+/C g3sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=lce4RdPps31hIlNPJj/g65WoH2upIbHeo4nt8VR/L7g=; b=dnXO5qImsRXcxecylKDSKQgqoF//mgN8ZMA6r7v42bgONvxi8Rdac6sKNfAEeklwox ZHBGGMtUubHkhDAOR4H2pl9igHMeLxWdKg24QTOFh2/JrG/mofrOR89ZDob9Rw3cLpQ+ ORu7I8bUgyZbkQ0bUDxUjLM8y8qZoI3wnrU/ihUhTbH+8cXj7z15iFUXWbCgCDKyUCcC 61QVKheADDbCJT0CU0yEC2EqB+NEbRGODtkKhsRZXq7zkN9u01hyI/16/3PGr/uMXU8w U8auEmNEAPaX3dxQTw+4dOPUxcVmoDK1nqzeC7Q4NKpn7p5PxN1YoThMyiGjjnkUvvvD Ajnw== X-Gm-Message-State: AA6/9Rl/t1uVD8E8Spx8QrNwOhZAygqD9RF4IzGNQPoaQeBDTV7VzsaBwNUNur4zrHx01d6uKItR3gQQJzrTAQ== X-Received: by 10.36.242.196 with SMTP id j187mr6987199ith.102.1476370170815; Thu, 13 Oct 2016 07:49:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.27.135 with HTTP; Thu, 13 Oct 2016 07:49:30 -0700 (PDT) In-Reply-To: References: From: Noe Detore Date: Thu, 13 Oct 2016 10:49:30 -0400 Message-ID: Subject: Re: Lost tablet server lock..SESSION_EXPIRED To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=94eb2c11718096b8c7053ec0371b archived-at: Thu, 13 Oct 2016 14:49:36 -0000 --94eb2c11718096b8c7053ec0371b Content-Type: text/plain; charset=UTF-8 Yes, seeing a lot of DEBUG:Upsess. Also seeing [server.GarbageCollectionLogger] DEBUG: gc ParNew=64.69(+1.24) secs ConcurrentMarkSweep=102.51(+0.06) secs freemem=4,844,821,808(-20,292,780,896) totalmem=25,525,551,104 2016-10-13 11:22:17,963 [zookeeper.ZooLock] DEBUG: event null None Disconnected During hotspot seems like a java gc pause is causing zk heart beat to miss and then expire. Are there recommend java gc configurations? We are using native memory. Would trying G1 gc be advised? Thank you On Fri, Oct 7, 2016 at 8:23 PM, Jeff Kubina wrote: > Noe, > > Do you have a lot (1000s) of "[tserver.TableServer] DEBUG: UpSess ..." > messages in your tserver logs prior to the FATAL or "ERROR: Lost tablet > server lock" error message? > > Jeff > > > -- > Jeff Kubina > 410-988-4436 > > > On Fri, Oct 7, 2016 at 10:34 AM, Noe Detore > wrote: > >> Any updates on this issue https://issues.apache.org/jira >> /browse/ACCUMULO-3336 ? I am seeing this behavior using 1.7.2 on one of >> our clusters. Not seeing on other clusters, but what could be some causes? >> Swap on server looks good as there is none. Are there particular >> configurations to adjust? >> >> org.apache.zookeeper.KeeperException$SessionExpiredException: >> KeeperErrorCode = Session expired ... >> 2016-10-06 23:22:30,633 [zookeeper.DistributedWorkQueue] INFO : Got >> unexpected zookeeper event: None for ... >> 2016-10-06 23:22:30,679 [tserver.TabletServer] ERROR: Lost tablet server >> lock (reason = SESSION_EXPIRED), exiting >> >> Thanks >> Noe >> > > --94eb2c11718096b8c7053ec0371b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Yes, seeing a lot of DEBUG:Upsess. Also seeing =C2=A0[serv= er.GarbageCollectionLogger] DEBUG: gc ParNew=3D64.69(+1.24) secs Concurrent= MarkSweep=3D102.51(+0.06) secs freemem=3D4,844,821,808(-20,292,780,896) tot= almem=3D25,525,551,104
2016-10-13 11:22:17,963 [zookeeper.ZooLock] DEBU= G: event null None Disconnected

During hotspot see= ms like a java gc pause is causing zk heart beat to miss and then expire. A= re there recommend java gc configurations?=C2=A0 We are using native memory= . Would trying G1 gc be advised?

Thank you

On Fri, Oct 7= , 2016 at 8:23 PM, Jeff Kubina <jeff.kubina@gmail.com> w= rote:
Noe,
=
Do you have a lot (1000s) of "[tserver.TableServer] DEBUG: U= pSess ..." messages in your tserver logs prior to the FATAL or "E= RROR: Lost tablet server lock" error message?

Jeff


--=C2=A0
Jeff Kubina
410-988-4436=


On Fri, Oct 7, 2016 at 10:34 AM, Noe Detore = <ndetore@minerkasch.com> wrote:
Any updates on this issue https://issues.apache.org/jira= /browse/ACCUMULO-3336 ? I am seeing this behavior using 1.7.2 on one of= our clusters. Not seeing on other clusters, but what could be some causes?= Swap on server looks good as there is none. Are there particular configura= tions to adjust?

org.apache.zookeeper.KeeperException$SessionExpiredExceptio= n: KeeperErrorCode =3D Session expired ...=C2=A0
2016-10-06 23:22:30,633 [zookeeper.DistributedWorkQueue] IN= FO : Got unexpected zookeeper event: None for ...=C2=A0
2016-10-06 23:22:30,679 [tserver.TabletServer] ERROR: Lost tab= let server lock (reason =3D SESSION_EXPIRED), exiting

Thanks
Noe


--94eb2c11718096b8c7053ec0371b--