Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 761DCDD47 for ; Thu, 18 Oct 2012 15:04:55 +0000 (UTC) Received: (qmail 10324 invoked by uid 500); 18 Oct 2012 15:04:55 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 10093 invoked by uid 500); 18 Oct 2012 15:04:54 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 10063 invoked by uid 99); 18 Oct 2012 15:04:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 15:04:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eric.newton@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 15:04:48 +0000 Received: by mail-qa0-f41.google.com with SMTP id c4so1655188qae.0 for ; Thu, 18 Oct 2012 08:04:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IzyBSUrWrH12Yo3ChI4wGjWvm+uVFn2mzuhnJT+OYY4=; b=XTsyp9PCBsJHsbsE39wPGoa0FjmujWzMWUNxLFQ/zcwsTueOKKn6Ym0nK7GXlWPN5f u1GkihpMM9eGmqdoGTCoz+fMnXuu918iiRI2UMELJ81O8lPDTUgG1zlyHDI7Tp0WTDzV kbbnTxIUeAjJlB6HsFlk29KNC6MRB0LrWdIpUBpp7BpYw49hL9lbwa7qzN6j8ZfEDY8l 2jzQiWD8SYE3S8f8arC0l44KJ59D/W07LuRDvsZGq3/X7IgsnkAIcA1dPrnqAAqtw8r0 WBly7O/pcOn9JIOUlCmV+zwu5dCLbz+yVdGTWGFDyQCunv/AHN0u1sHt97RU+cjhg57q iCVA== MIME-Version: 1.0 Received: by 10.224.107.5 with SMTP id z5mr34409524qao.41.1350572667264; Thu, 18 Oct 2012 08:04:27 -0700 (PDT) Received: by 10.229.103.221 with HTTP; Thu, 18 Oct 2012 08:04:27 -0700 (PDT) In-Reply-To: References: Date: Thu, 18 Oct 2012 11:04:27 -0400 Message-ID: Subject: Re: Thread "shell" Stuck on IO From: Eric Newton To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=20cf3074afd2135f9004cc56b4ca X-Virus-Checked: Checked by ClamAV on apache.org --20cf3074afd2135f9004cc56b4ca Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable $ pkill -f accumulo.start $ hadoop fs -rmr /accumulo $ ./bin/accumulo init -Eric On Thu, Oct 18, 2012 at 10:46 AM, Ott, Charles H. w= rote: > I apologize for not giving more information from the start.**** > > ** ** > > I am running a single instance on a single virtual server. Zookeeper > shows a single server ssdev:2181 in =91standalone=92 mode.**** > > ** ** > > This is a development system and there are no tables at this time. The I= P > conflict issue was noticed when I tried to create a table for the first > time the shell started to hang.**** > > ** ** > > I have tried restarting the system but have been seeing the message: > =93Recovery of 192.168.0.130:11224:[some UUID] failed.=94 And the shell s= till > hangs when performing a scan or createtable.**** > > ** ** > > I will look into =93re-initializing=94 the server.**** > > ** ** > > *From:* user-return-1496-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org[mai= lto: > user-return-1496-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org] *On Behalf > Of *Eric Newton > *Sent:* Thursday, October 18, 2012 7:41 AM > > *To:* user@accumulo.apache.org > *Subject:* Re: Thread "shell" Stuck on IO**** > > ** ** > > The reference to 192.168.0.130 is in zookeeper or the metadata table.**** > > ** ** > > Unfortunately, this is a known problem with 1.3 and 1.4. You can't chang= e > your IP addresses. You can incrementally shutdown servers and change the > IP address one-at-a-time, but not all at once.**** > > ** ** > > If this is a dev system and you don't need the data, the fastest thing to > do is to reset the system and re-load your test data.**** > > ** ** > > If you can't reload your data, you will have to move your data in hdfs, > re-initialize and bulk-import the existing tables.**** > > ** ** > > -Eric**** > > ** ** > > On Wed, Oct 17, 2012 at 5:40 PM, Ott, Charles H. > wrote:**** > > I believe you have already helped me get on the right track... > > First, 192.168.0.130 is the IP that the VM came with preconfigured. > I changed the IP for this new environment in RHEL5 and "most" everything > seems to be running... however, the fact that it is reporting > 192.168.0.130 tells me that somewhere in the logger configuration it's > still using the old IP? > > All of the properties files I have looked at specify the hostname, not > IP... I checked the hosts file and the hostname is resolving the proper > IP, so that shouldn't be an issue. > > When I try to start the logger with: > > # ./cloudbase.sh logger > > I see: > Failed to initialize log service args=3D[] > java.io.IOException: Failed to acquire lock file > at > cloudbase.server.logger.LogService.(LogService.java:122) > at > cloudbase.server.logger.LogService.main(LogService.java:83) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav > a:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor > Impl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at cloudbase.start.Main$1.run(Main.java:73) > at java.lang.Thread.run(Thread.java:662)**** > > > -----Original Message----- > From: user-return-1492-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org > [mailto:user-return-1492-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org] On > Behalf Of Keith Turner > Sent: Wednesday, October 17, 2012 5:09 PM > To: user@accumulo.apache.org > Subject: Re: Thread "shell" Stuck on IO > > Is the logger at 192.168.0.130 running. The stack trace indicates > that the master was attempting to contact the logger at 192.168.0.130 to > initiate log recovery. > > On Wed, Oct 17, 2012 at 4:58 PM, Ott, Charles H. > wrote: > > I am using a VMware ESXi 4.1 server with Cloudbase(Accumulo) on > RHEL5. > > > > I cannot start with a fresh install because I am somewhat required to > > use the preconfigured image on the vm. (business rules out of my > > hands) > > > > Unfortunately the support for this preconfigured instance is not**** > > > available and I am tasked with getting it working anyway...**** > > > > > > > > > I am able to log into the shell and view the tables, however if I > > attempt to create a table or perform a scan, a line return is shown > > and then it just hangs there until finally throwing the following > error: > > > > WARN thread "shell" stuck on IO to ssdev:9999:9999 (0) for at least > > 120044 ms. > > > > > > > > I did also discover that 9999 is the property: master.port.client in > > my conf/accumulo-site.xml > > > > > > > > There is also an event log that was added to the VM with web based UI > > reporting: > > > > Unable to recover > > > 192.168.0.130:11224/b4da830b-8ecb-4868-a480-35a39f4af17a(java.io.IOExcep > tion: > > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: > > Connection timed out) > > > > java.io.IOException: > > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: > > Connection timed out > > > > at > > cloudbase.server.tabletserver.log.RemoteLogger.(RemoteLogger.jav > > a:75) > > > > at > > cloudbase.server.master.CoordinateRecoveryTask$RecoveryJob.startCopy(C > > oordinateRecoveryTask.java:109) > > > > at > > cloudbase.server.master.CoordinateRecoveryTask$RecoveryJob.access$400( > > CoordinateRecoveryTask.java:93) > > > > at > > cloudbase.server.master.CoordinateRecoveryTask.recover(CoordinateRecov > > eryTask.java:279) > > > > at > > cloudbase.server.master.Master$TabletGroupWatcher.run(Master.java:1155 > > ) > > > > Caused by: org.apache.thrift.transport.TTransportException: > > java.net.ConnectException: Connection timed out > > > > at > > cloudbase.core.client.impl.ThriftTransportPool.createNewTransport(Thri > > ftTransportPool.java:428) > > > > at > > cloudbase.core.client.impl.ThriftTransportPool.getTransport(ThriftTran > > sportPool.java:415) > > > > at > > cloudbase.core.client.impl.ThriftTransportPool.getTransport(ThriftTran > > sportPool.java:392) > > > > at > > cloudbase.core.util.ThriftUtil.getClient(ThriftUtil.java:58) > > > > at > > cloudbase.server.tabletserver.log.RemoteLogger.(RemoteLogger.jav > > a:73) > > > > ... 4 more > > > > Caused by: java.net.ConnectException: Connection timed out > > > > at sun.nio.ch.Net.connect(Native Method) > > > > at > > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500) > > > > at > > sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:81) > > > > at > > sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:65) > > > > at > > cloudbase.core.util.TTimeoutTransport.create(TTimeoutTransport.java:23 > > ) > > > > at > > cloudbase.core.client.impl.ThriftTransportPool.createNewTransport(Thri > > ftTransportPool.java:426) > > > > ... 8 more > > > > > > > > > > > > I have seen posts relating this to the walogs folder not being > > available, but I have checked that and the .lock file is being created > automatically. > > > > A #netstat | grep 9999 shows no processes using 9999 before logging**** > > > into the shell... so Im not sure there is a port conflict either.**** > > > > > > > > > Any thoughts on the matter would be greatly appreciated.**** > > ** ** > --20cf3074afd2135f9004cc56b4ca Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
$ pkill -f accumulo.start
$ hadoop fs -rmr /accumulo
$ ./bin/= accumulo init

-Eric

On Thu, Oct 18, 2012 at 10:46 AM, Ott, Charles H. <CHARLES.H= .OTT@saic.com> wrote:

I apologize f= or not giving more information from the start.

=A0<= /p>

I am running a single = instance on a single virtual server.=A0 Zookeeper shows a single server ssd= ev:2181 in =91standalone=92 mode.

=A0<= /p>

This is a development = system and there are no tables at this time.=A0 The IP conflict issue was n= oticed when I tried to create a table for the first time the shell started = to hang.

=A0<= /p>

I have tried restartin= g the system but have been seeing the message: =93Recovery of 192.168.0.130= :11224:[some UUID] failed.=94 And the shell still hangs when performing a s= can or createtable.

=A0<= /p>

I will look into =93re= -initializing=94 the server.

=A0<= /p>

From: user-= return-1496-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org [mailto:user-return-1496-CHARL= ES.H.OTT=3Dsaic.com@accumulo.apache.org] On Behalf Of Eric Newton
Sent: Thursday, October 18, 2012 7:41 AM


To: user@accumulo.apache.org
Subject: Re: Thread &quo= t;shell" Stuck on IO

=A0

The reference to 192.168.0.130 is in zookeeper = or the metadata table.

= =A0

Unfortunately, this is a known problem wi= th 1.3 and 1.4. =A0You can't change your IP addresses. =A0You can incre= mentally shutdown servers and change the IP address one-at-a-time, but not = all at once.

=A0

If this is a dev system and you don't need the data, the= fastest thing to do is to reset the system and re-load your test data.<= /u>

=A0

If you can't reload your data, you will have to move you= r data in hdfs, re-initialize and bulk-import the existing tables.

=A0

-Eric

=A0

On Wed, Oct 17, 2012 at 5:40 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com> wrot= e:

I believe you have already helped me get on the righ= t track...

First, 192.168.0.130 is the IP that the VM came with prec= onfigured.
I changed the IP for this new environment in RHEL5 and "= most" everything
seems to be running... however, the fact that it is reporting
192.168.0.= 130 tells me that somewhere in the logger configuration it's
still u= sing the old IP?

All of the properties files I have looked at specif= y the hostname, not
IP... I checked the hosts file and the hostname is resolving the proper
= IP, so that shouldn't be an issue.

When I try to start the logge= r with:

# ./cloudbase.sh logger

=A0I see:
Failed to initia= lize log service args=3D[]
=A0 =A0 =A0 =A0 java.io.IOException: Failed to acquire lock file
=A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 at
cloudbase.server.logger.LogService.<init&= gt;(LogService.java:122)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at
cloudbase= .server.logger.LogService.main(LogService.java:83)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at sun.reflect.NativeMethodAccessorImpl.inv= oke0(Native
Method)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at
sun.reflect= .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at
sun.reflect.DelegatingMethodAccessorI= mpl.invoke(DelegatingMethodAccessor
Impl.java:25)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at java.lang.reflect.Metho= d.invoke(Method.java:597)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at cloudbase.s= tart.Main$1.run(Main.java:73)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 at java.la= ng.Thread.run(Thread.java:662)


-----Original Message-----
From: user-re= turn-1492-CHARLES.H.OTT=3Dsaic.com@accumulo.apache.org
[mailto:user-return-1492-CHAR= LES.H.OTT=3Dsaic.com@accumulo.apache.org] On
Behalf Of Keith Turner
Sent: Wednesday, October 17, 2012 5:09 PM
To: = user@accumulo= .apache.org
Subject: Re: Thread "shell" Stuck on IO
Is the logger at 192.168.0.130 running. =A0 The stack trace indicates
th= at the master was attempting to contact the logger at 192.168.0.130 to
i= nitiate log recovery.

On Wed, Oct 17, 2012 at 4:58 PM, Ott, Charles = H.
<CHARLES.H.O= TT@saic.com> wrote:
> I am using a VMware ESXi 4.1 server =A0w= ith Cloudbase(Accumulo) =A0on
RHEL5.
>
> I cannot start with= a fresh install because I am somewhat required to
> use the preconfigured image on the vm. (business rules out of my
&g= t; hands)
>
> Unfortunately the support for this preconfigured = instance is not

> availabl= e and I am tasked with getting it working anyway...

>
>
>
> I am able to = log into the shell and view the tables, however if =A0I
> attempt to = create a table or perform a scan, a line return is shown
> and then i= t just hangs there until finally throwing the following
error:
>
> WARN thread "shell" stuck on IO to ssdev:9= 999:9999 (0) for at least
> 120044 ms.
>
>
>
>= ; I did also discover that 9999 is the property: master.port.client in
> my conf/accumulo-site.xml
>
>
>
> There is als= o an event log that was added to the VM with web based UI
> reporting= :
>
> Unable to recover
>
192.168.0.130:11224/b4da830b-8ecb-4868-a480-35a39f4af17a(java.io.IOE= xcep
tion:
> org.apache.thrift.transport.TTransportException:
java.net.= ConnectException:
> Connection timed out)
>
> =A0 =A0 =A0= =A0 =A0java.io.IOException:
> org.apache.thrift.transport.TTransport= Exception:
java.net.ConnectException:
> Connection timed out
>
> =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.server.tabletserver.lo= g.RemoteLogger.<init>(RemoteLogger.jav
> a:75)
>
> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.server.master.CoordinateRecoveryTask$RecoveryJob.startCopy(C=
> oordinateRecoveryTask.java:109)
>
> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0at
> cloudbase.server.master.CoordinateRecoveryTas= k$RecoveryJob.access$400(
> CoordinateRecoveryTask.java:93)
>
> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0at
> cloudbase.server.master.CoordinateRecoveryTask.re= cover(CoordinateRecov
> eryTask.java:279)
>
> =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.server.master.Master$TabletGro= upWatcher.run(Master.java:1155
> )
>
> =A0 =A0 =A0 =A0 =A0Caused by: org.apache.thrift.tran= sport.TTransportException:
> java.net.ConnectException: Connection ti= med out
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloud= base.core.client.impl.ThriftTransportPool.createNewTransport(Thri
> ftTransportPool.java:428)
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0at
> cloudbase.core.client.impl.ThriftTransportPool.getTranspo= rt(ThriftTran
> sportPool.java:415)
>
> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0at
> cloudbase.core.client.impl.ThriftTransportPoo= l.getTransport(ThriftTran
> sportPool.java:392)
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0at
> cloudbase.core.util.ThriftUtil.getClient(ThriftUtil.java:58)<= br>>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.ser= ver.tabletserver.log.RemoteLogger.<init>(RemoteLogger.jav
> a:73)
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0... 4 more>
> =A0 =A0 =A0 =A0 =A0Caused by: java.net.ConnectException: Conn= ection timed out
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at sun.= nio.ch.Net.connect(Native Method)
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> sun.nio.ch.SocketChannel= Impl.connect(SocketChannelImpl.java:500)
>
> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0at
> sun.nio.ch.SocketAdaptor.connect(SocketAdapto= r.java:81)
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:65)
>
>= ; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.core.util.TTimeou= tTransport.create(TTimeoutTransport.java:23
> )
>
> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0at
> cloudbase.core.client.impl.ThriftTransportPool.createNewTransport(Thri=
> ftTransportPool.java:426)
>
> =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0... 8 more
>
>
>
>
>
> I hav= e seen posts relating this to the walogs folder not being
> available, but I have checked that and the .lock file is being created=
automatically.
>
> A #netstat | grep 9999 shows no processe= s using 9999 before logging

> into the shell... so Im not sure there is a port conflict either.

>
>
>
> = Any thoughts on the matter would be greatly appreciated.

=A0

<= /div>

--20cf3074afd2135f9004cc56b4ca--