accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: IOException in internalRead! & transient exception communicating with ZooKeeper
Date Wed, 24 Feb 2016 16:59:40 GMT
ZooKeeper is a funny system. This kind of ConnectionLossException is a 
normal "state" that a ZooKeeper client can enter. We handle this 
condition in Accumulo, retrying the operation (in this case, a 
`create()`), after the client can reconnect to the ZooKeeper servers in 
the background.

ConnectionLossExceptions can be indicative of over-saturation of your 
nodes. A ZooKeeper client might lose it's connection because it is 
starved for CPU time. It can also indicate that the ZooKeeper servers 
might be starved for resources.

* Check the ZooKeeper server logs for any errors about dropped 
connections (maxClientCnxns)
* Make sure your servers running Accumulo are not running at 100% total 
CPU usage and that there is free memory (no swapping).

ACCUMULO-3336 is about a different ZooKeeper error condition called a 
"session loss". This is when the entire ZooKeeper session needs to be 
torn down and recreated. This only happens after prolonged pauses in the 
client JVM or the ZooKeeper servers actively drop your connections due 
to the internal configuration (maxClientCnxns). The stacktrace you 
copied is not a session loss error.

Are you saying that when a ZooKeeper server dies, you cannot use 
Accumulo? How many are you running?

mohit.kaushik wrote:
> Sent so early...
>
> Another exception I am getting frequently with zookeeper which is a
> bigger problem.
> ACCUMULO-3336 <https://issues.apache.org/jira/browse/ACCUMULO-3336> says
> it is unresolved yet
>
> Saw (possibly) transient exception communicating with ZooKeeper
> 	org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/gc/lock
> 		at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> 		at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 		at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> 		at org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
> 		at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
> 		at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> 		at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> And the worst case is whenever a zookeeper goes down cluster becomes
> unreacheble for the time being, untill it restarts ingest process halts.
>
> What do you suggest, I need to resolve these problems. I do not want to
> be the ingest process to stop ever.
>
> Thanks
> Mohit kaushik
>
>
> On 02/22/2016 12:06 PM, mohit.kaushik wrote:
>> I am facing the below given exception continuously, the count keeps on increasing
every sec(current value around 3000 on a server) I can see the exception for all 3 tablet
servers.
>>
>> ACCUMULO-2420  <https://issues.apache.org/jira/browse/ACCUMULO-2420>  says
that this exception comes when a client closes a connection before scan completes. But the
connection is not closed every thread uses a common connection object to ingest and query,
then what could cause this exception?
>>
>> 	java.io.IOException: Connection reset by peer
>> 		at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> 		at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> 		at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> 		at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> 		at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> 		at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>> 		at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
>> 		at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
>> 		at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
>> 		at org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
>> 		at org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run(CustomNonBlockingServer.java:184)
>>
>> Regards
>> Mohit kaushik
>>

Mime
View raw message