Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7152F744 for ; Thu, 9 May 2013 19:42:18 +0000 (UTC) Received: (qmail 69217 invoked by uid 500); 9 May 2013 19:42:18 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 69184 invoked by uid 500); 9 May 2013 19:42:18 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 69170 invoked by uid 99); 9 May 2013 19:42:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 19:42:18 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of David.Slater@jhuapl.edu designates 128.244.251.37 as permitted sender) Received: from [128.244.251.37] (HELO piper.jhuapl.edu) (128.244.251.37) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 19:42:11 +0000 Received: from aplexcas2.dom1.jhuapl.edu (unknown [128.244.198.91]) by piper.jhuapl.edu with smtp (TLS: TLSv1/SSLv3,128bits,RC4-MD5) id 0bf5_fab6_a097a81b_79fb_4b5b_9cc3_2921e8aa3302; Thu, 09 May 2013 15:41:49 -0400 Received: from aplesstripe.dom1.jhuapl.edu ([128.244.198.211]) by aplexcas2.dom1.jhuapl.edu ([128.244.198.91]) with mapi; Thu, 9 May 2013 15:41:50 -0400 From: "Slater, David M." To: "user@accumulo.apache.org" , "vines@apache.org" Date: Thu, 9 May 2013 15:41:48 -0400 Subject: RE: warning: there are no loggers registered in zookeeper Thread-Topic: warning: there are no loggers registered in zookeeper Thread-Index: Ac5M6KCdNUB33yb5Qi6llTPimfThDAABCwdQ Message-ID: References: <518BF0E4.5070502@gmail.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_AC78983C72177B4D9D1C14F7F4AEBA21442747D8AEaplesstripedo_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_AC78983C72177B4D9D1C14F7F4AEBA21442747D8AEaplesstripedo_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable >From the out log file, it looks like the logger processes are killing thems= elves: # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=3D"kill -9 %p" # Executing /bin/sh -c "kill -9 20759"... I'm not doing any other concurrent Accumulo operations, though, so I can't = figure out why I would be running out of heap space in the JVM. Thoughts? From: John Vines [mailto:vines@apache.org] Sent: Thursday, May 09, 2013 3:08 PM To: user@accumulo.apache.org Subject: Re: warning: there are no loggers registered in zookeeper Are your logger processes dying? Check their logs, including the out and er= r files if they are. On Thu, May 9, 2013 at 2:58 PM, Slater, David M. > wrote: The recoveries were kicked off, but their copy/sort never got beyond 0%. I = get the same string of warnings I got before. I then get "WARN: Recovery of= *.*.*.*:11224:6087aec0-c6e7-4473-84f3-8e78fb1eca5d failed" for the data no= des. Thoughts? David -----Original Message----- From: Josh Elser [mailto:josh.elser@gmail.com] Sent: Thursday, May 09, 2013 2:54 PM To: user@accumulo.apache.org Subject: Re: warning: there are no loggers registered in zookeeper Do a stop-all.sh. Make sure everything is actually down (pssh/pdsh and use = ps/jps to determine that all Accumulo processes are stopped). Run start-all= .sh again and see if you still have issues (likely you'll have some WAL rec= overies kick off). If you do, check the logger and tserver log files to get the actual problem= . On 05/09/2013 02:48 PM, Slater, David M. wrote: > > Hey everyone, > > After a bad shutdown, I'm getting the warning "there are no loggers > registered in zookeeper", followed by "warning: unable to connect to > ***: org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection refused". This then leads to a > number of timeout errors, "unable to get tablet server status". During > the entire time I have no access to the tablet servers. And then it > crashes. > > Is there a way to get the loggers back into a good state without > destroying all of the tables? > > Thanks, > David > --_000_AC78983C72177B4D9D1C14F7F4AEBA21442747D8AEaplesstripedo_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

From the = out log file, it looks like the logger processes are killing themselves:

# java.lang.OutOfMemoryError: Java = heap space

# -XX:OnOutOfMemoryError=3D&q= uot;kill -9 %p"

#   Execu= ting /bin/sh -c "kill -9 20759"...

 

I’= ;m not doing any other concurrent Accumulo operations, though, so I canR= 17;t figure out why I would be running out of heap space in the JVM.

 

Thoughts?

 

From: John Vines [ma= ilto:vines@apache.org]
Sent: Thursday, May 09, 2013 3:08 PM
<= b>To: user@accumulo.apache.org
Subject: Re: warning: there ar= e no loggers registered in zookeeper

 

Are your logger processe= s dying? Check their logs, including the out and err files if they are.

<= o:p> 

On Thu, May 9, 2013 at 2:58 P= M, Slater, David M. <David.Slater@jhuapl.edu> wrote:

The recoveries were kicked off, but their copy/sort never got = beyond 0%. I get the same string of warnings I got before. I then get "= ;WARN: Recovery of *.*.*.*:11224:6087aec0-c6e7-4473-84f3-8e78fb1eca5d faile= d" for the data nodes.

Thoughts?
David


-----Original Message-----=
From: Josh Elser [mailto:josh.e= lser@gmail.com]
Sent: Thursday, May 09, 2013 2:54 PM
To: user@accumulo.apache.org
Subjec= t: Re: warning: there are no loggers registered in zookeeper

Do a st= op-all.sh. Make sure everything is actually down (pssh/pdsh and use ps/jps = to determine that all Accumulo processes are stopped). Run start-all.sh aga= in and see if you still have issues (likely you'll have some WAL recoveries= kick off).

If you do, check the logger and tserver log files to get= the actual problem.

On 05/09/2013 02:48 PM, Slater, David M. wrote:=
>
> Hey everyone,
>
> After a bad shutdown, I'm ge= tting the warning "there are no loggers
> registered in zookeepe= r", followed by "warning: unable to connect to
> ***: org.a= pache.thrift.transport.TTransportException:
> java.net.ConnectExcepti= on: Connection refused". This then leads to a
> number of timeou= t errors, "unable to get tablet server status". During
> th= e entire time I have no access to the tablet servers. And then it
> c= rashes.
>
> Is there a way to get the loggers back into a good = state without
> destroying all of the tables?
>
> Thanks,=
> David
>

 

= --_000_AC78983C72177B4D9D1C14F7F4AEBA21442747D8AEaplesstripedo_--