accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <busbey+li...@cloudera.com>
Subject Re: Tablet server stuck waiting for lock
Date Wed, 05 Mar 2014 17:27:40 GMT
Hi Alex!

Can you post your logs somewhere?

How many zookeeper servers are you running?

Is iptables enabled?

was your netcat test run local to the zookeeper server or on a remote
server?

What virtualization platform is this running on top of?

-Sean


On Wed, Mar 5, 2014 at 11:17 AM, Alex Lee <alee@orbistechnologies.com>wrote:

>  Hello,
>
>
>
> I’m trying to create a virtualized Accumulo 1.4.4 cluster with 4 tablet
> servers using Hadoop 0.20.2 and ZooKeeper 3.3.5. It didn’t seem to be
> working correctly with 4 tablet servers, so I first tried just running with
> one tablet server, which seemed to work fine. When I tried to run it with
> just 2 tablet servers, I ran into some issues.
>
>
>
> Just to preface, I double checked configs within zookeeper and accumulo,
> and everything matches. All hostnames are resolving correctly, and
> passwordless SSH for the accumulo user is also functional between all
> nodes. Running “echo stat | nc <zk-server> <zk port>” responds
> appropriately.
>
>
>
> Here’s the first error log for the Tablet Master:
>
>
>
> 2014-03-05 11:18:16,626 [master.Master] ERROR: Error processing table
> state for store Root Tablet
>
> org.apache.thrift.transport.TTransportException: java.io.IOException:
> Connection reset by peer
>
>         at
> org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
>
>         at
> org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)
>
>         at
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.flush(ThriftTransportPool.java:299)
>
>         at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.send_loadTablet(TabletClientService.java:653)
>
>         at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.loadTablet(TabletClientService.java:640)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
>         at java.lang.reflect.Method.invoke(Unknown Source)
>
>         at
> org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:84)
>
>         at com.sun.proxy.$Proxy4.loadTablet(Unknown Source)
>
>         at
> org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:86)
>
>         at
> org.apache.accumulo.server.master.Master$TabletGroupWatcher.flushChanges(Master.java:1818)
>
>         at
> org.apache.accumulo.server.master.Master$TabletGroupWatcher.run(Master.java:1426)
>
> Caused by: java.io.IOException: Connection reset by peer
>
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>
>         at sun.nio.ch.IOUtil.write(Unknown Source)
>
>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>
>         at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>
>         at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>
>         at java.io.BufferedOutputStream.flush(Unknown Source)
>
>         at
> org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
>
>         ... 13 more
>
>
>
> Here are the error logs for Tablet Server #1:
>
>
>
> 2014-03-05 11:17:15,152 [tabletserver.TabletServer] INFO : Tablet server
> starting on 172.16.111.3
>
> 2014-03-05 11:17:15,187 [util.FileSystemMonitor] INFO : Filesystem monitor
> started
>
> 2014-03-05 11:17:15,194 [tabletserver.NativeMap] INFO : Loaded native map
> shared library
> /opt/accumulo/accumulo/lib/native/map/libNativeMap-Linux-amd64-64.so
>
> 2014-03-05 11:17:15,499 [tabletserver.TabletServer] INFO : port = 9997
>
> 2014-03-05 11:17:15,540 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
>
> 2014-03-05 11:17:16,633 [tabletserver.TabletServer] WARN : Got loadTablet
> message from master before lock acquired, ignoring...
>
> 2014-03-05 11:17:16,634 [server.TNonblockingServer] ERROR: Unexpected
> exception while invoking!
>
> java.lang.RuntimeException: Lock not acquired
>
>         at
> org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.checkPermission(TabletServer.java:1782)
>
>         at
> org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.loadTablet(TabletServer.java:1814)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
>         at java.lang.reflect.Method.invoke(Unknown Source)
>
>         at
> org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:59)
>
>         at com.sun.proxy.$Proxy1.loadTablet(Unknown Source)
>
>         at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$loadTablet.process(TabletClientService.java:2510)
>
>         at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2053)
>
>         at
> org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:154)
>
>         at
> org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
>
>         at
> org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:202)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>
>         at java.lang.Thread.run(Unknown Source)
>
> 2014-03-05 11:17:20,564 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
>
> 2014-03-05 11:17:25,589 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
>
>
>
> (continues until too many retries, then exits)
>
>
>
> Tablet Server #2’s logs get as far as this (below), and then just stop.
>
>
>
> 2014-03-05 11:17:14,112 [tabletserver.TabletServer] INFO : Tablet server
> starting on 172.16.111.3
>
> 2014-03-05 11:17:14,149 [util.FileSystemMonitor] INFO : Filesystem monitor
> started
>
> 2014-03-05 11:17:14,157 [tabletserver.NativeMap] INFO : Loaded native map
> shared library
> /opt/accumulo/accumulo/lib/native/map/libNativeMap-Linux-amd64-64.so
>
> 2014-03-05 11:17:14,481 [tabletserver.TabletServer] INFO : port = 9997
>
>
>
> Also, the master logs interestingly never make any calls to Tablet #2’s IP
> address.
>
>
>
> Any thoughts? We have another cluster that is setup identically in just
> about every way (besides hostnames), but it has never experienced any of
> these issues. My research shows that these issues can exist within 1.4.3,
> which we were using at first, but we switched to 1.4.4 because these types
> of issues were supposed to be resolved. Any help would be greatly
> appreciated.
>
>
>
> Thanks,
>
>
>
> Alex Lee
>

Mime
View raw message