accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: EXTERNAL: Re: There are no tablet servers
Date Wed, 18 Jul 2012 20:16:56 GMT
You can add

set -x

in start-server.sh, and it will show you what the script is trying to do.

-Eric

On Wed, Jul 18, 2012 at 4:12 PM, Cardon, Tejay E
<tejay.e.cardon@lmco.com> wrote:
> Eric,
> Good to know about the tracers.  I setup 4 tracers for an 8 node setup, but I'll back
that down to just the 1.  As for the .out or .err files on tservers, I've got nothing.  There
is no evidence that those servers were ever touched.  I'm thinking the next step would be
to execute the start-here.sh script on each tserver and look for errors.  Is that the best
approach, and if so, what arguments should I pass?  I'm digging into the start-all.sh script
for answers, but if someone already knows what my arguments are.... all the better.
>
> Thanks,
> Tejay
>
> -----Original Message-----
> From: Eric Newton [mailto:eric.newton@gmail.com]
> Sent: Wednesday, July 18, 2012 2:00 PM
> To: user@accumulo.apache.org
> Subject: EXTERNAL: Re: There are no tablet servers
>
> Don't start a tracer on every server.  Just start one on a master server.  You won't
need more than 1 until you get several hundred servers.
>
> Do you have anything in the .out or .err files on the tserver hosts?
> If the files don't exist, something is failing in the ssh to those hosts.
>
> -Eric
>
> On Wed, Jul 18, 2012 at 2:15 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com> wrote:
>> All,
>>
>> I'm running into a strange challenge in my latest Accumulo installation.
>> I've developed some chef recipes for deploying Accumulo, and have
>> tested them on three clusters now with no problems.  Using the same
>> scripts, I recent did another deployment, but I'm having trouble on this one.
>>
>>
>>
>> After installing Accumulo, updating the config files, and setting up
>> passwordless ssh, I ran:
>>
>> ./accumulo init
>>
>>
>>
>> Everything went normally with me setting the instanceID and password
>>
>>
>>
>> Then I ran
>>
>> ./start-all.sh
>>
>>
>> Again, everything went smoothly with the following output:
>>
>> bash-3.2$ ./start-all.sh
>>
>> Starting tablet servers and loggers ....... done
>>
>> Starting tablet server on de8-9a-8f-83-be-52
>>
>> Starting logger on de8-9a-8f-83-be-52
>>
>> Starting tablet server on d04-7d-7b-06-5e-48
>>
>> Starting logger on de8-9a-8f-d3-3e-f8
>>
>> Starting tablet server on d04-7d-7b-06-5d-f4
>>
>> Starting logger on d04-7d-7b-06-5e-48
>>
>> Starting logger on d04-7d-7b-06-5d-f4
>>
>> Starting tablet server on de8-9a-8f-d3-3e-f8
>>
>> 18 12:48:50,970 [server.Accumulo] INFO : Attempting to talk to
>> zookeeper
>>
>> 18 12:48:51,182 [server.Accumulo] INFO : Zookeeper connected and
>> initialized, attemping to talk to HDFS
>>
>> 18 12:48:51,568 [server.Accumulo] INFO : Connected to HDFS
>>
>> Starting master on d04-7d-7b-06-5d-80
>>
>> Starting garbage collector on d04-7d-7b-06-5e-ba
>>
>> Starting monitor on d04-7d-7b-06-5e-ba
>>
>> Starting tracer on d04-7d-7b-06-5d-80
>>
>> Starting tracer on de8-9a-8f-d3-3e-f8
>>
>> Starting tracer on d04-7d-7b-06-5e-48
>>
>>
>>
>> I can also run a stop-all.sh with no complaints from the script.
>>
>>
>>
>> However, if I try to start the Accumulo shell, I get
>>
>>
>>
>> bash-3.2$ ./accumulo shell
>>
>> Enter current password for 'hdfs'@'test4': ******
>>
>> 18 13:00:17,906 [impl.ServerClient] WARN : There are no tablet servers:
>> check that zookeeper and accumulo are running.
>>
>>
>>
>> If I check the tablet server machines I find that they do not have any
>> Accumulo processes running, and the master does not have any tablet
>> server logs.  (it does have the tracer logs, however).
>>
>>
>>
>> I've attached the log files here (without the empty ones).  There is
>> an error trying to "clean up old log sort" and a thrift error.
>>
>> I'm at a loss for where to begin on the debugging for this.  Any
>> thoughts would be greatly appreciated.
>>
>>
>>
>>
>>
>> 18 12:48:54,100 [master.CoordinateRecoveryTask] ERROR: Error cleaning
>> up old Log Sort jobsjava.io.IOException: Call to /10.1.24.65:50030
>> failed on local
>> exception: java.io.EOFException
>>
>>
>>
>> 18 12:48:57,016 [impl.ServerClient] DEBUG: ClientService request
>> failed null, retrying ...
>>
>> org.apache.thrift.transport.TTransportException: Failed to connect to
>> a server
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTranspo
>> rt(ThriftTransportPool.java:437)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server
>> Client.java:145)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server
>> Client.java:123)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerCli
>> ent.java:105)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient
>> .java:71)
>>
>>                 at
>> org.apache.accumulo.core.client.impl.ConnectorImpl.<init>(ConnectorImp
>> l.java:75)
>>
>>                 at
>> org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZoo
>> Instance.java:145)
>>
>>                 at
>> org.apache.accumulo.server.trace.TraceServer.<init>(TraceServer.java:1
>> 52)
>>
>>                 at
>> org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:222
>> )
>>
>>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>
>>                 at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> ava:39)
>>
>>                 at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> orImpl.java:25)
>>
>>                 at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>                 at org.apache.accumulo.start.Main$1.run(Main.java:89)
>>
>>                 at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>

Mime
View raw message