accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mohit.kaushik" <mohit.kaus...@orkash.com>
Subject Re: IOException in internalRead! & transient exception communicating with ZooKeeper
Date Fri, 26 Feb 2016 12:29:36 GMT
Thanks for the reply Josh, I am running 3 zookeeper servers.

On 02/24/2016 10:29 PM, Josh Elser wrote:
> ZooKeeper is a funny system. This kind of ConnectionLossException is a 
> normal "state" that a ZooKeeper client can enter. We handle this 
> condition in Accumulo, retrying the operation (in this case, a 
> `create()`), after the client can reconnect to the ZooKeeper servers 
> in the background.
>
> ConnectionLossExceptions can be indicative of over-saturation of your 
> nodes. A ZooKeeper client might lose it's connection because it is 
> starved for CPU time. It can also indicate that the ZooKeeper servers 
> might be starved for resources.
>
> * Check the ZooKeeper server logs for any errors about dropped 
> connections (maxClientCnxns)
> * Make sure your servers running Accumulo are not running at 100% 
> total CPU usage and that there is free memory (no swapping).
>
> ACCUMULO-3336 is about a different ZooKeeper error condition called a 
> "session loss". This is when the entire ZooKeeper session needs to be 
> torn down and recreated. This only happens after prolonged pauses in 
> the client JVM or the ZooKeeper servers actively drop your connections 
> due to the internal configuration (maxClientCnxns). The stacktrace you 
> copied is not a session loss error.
>
> Are you saying that when a ZooKeeper server dies, you cannot use 
> Accumulo? How many are you running?
>
> mohit.kaushik wrote:
>> Sent so early...
>>
>> Another exception I am getting frequently with zookeeper which is a
>> bigger problem.
>> ACCUMULO-3336 <https://issues.apache.org/jira/browse/ACCUMULO-3336> says
>> it is unresolved yet
>>
>> Saw (possibly) transient exception communicating with ZooKeeper
>>     org.apache.zookeeper.KeeperException$ConnectionLossException: 
>> KeeperErrorCode = ConnectionLoss for 
>> /accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/gc/lock
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>>         at 
>> org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
>>         at 
>> org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
>>         at 
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
>>         at 
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>>
>> And the worst case is whenever a zookeeper goes down cluster becomes
>> unreacheble for the time being, untill it restarts ingest process halts.
>>
>> What do you suggest, I need to resolve these problems. I do not want to
>> be the ingest process to stop ever.
>>
>> Thanks
>> Mohit kaushik
>>
>>
>> On 02/22/2016 12:06 PM, mohit.kaushik wrote:
>>> I am facing the below given exception continuously, the count keeps 
>>> on increasing every sec(current value around 3000 on a server) I can 
>>> see the exception for all 3 tablet servers.
>>>
>>> ACCUMULO-2420 <https://issues.apache.org/jira/browse/ACCUMULO-2420> 
>>> says that this exception comes when a client closes a connection 
>>> before scan completes. But the connection is not closed every thread 
>>> uses a common connection object to ingest and query, then what could 
>>> cause this exception?
>>>
>>>     java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>         at 
>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>         at 
>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>         at 
>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
>>>         at 
>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
>>>         at 
>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
>>>         at 
>>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
>>>         at 
>>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run(CustomNonBlockingServer.java:184)
>>>
>>> Regards
>>> Mohit kaushik
>>>
>


-- 
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012> 
<http://www.linkedin.com/company/orkash-services-private-limited> 
<https://twitter.com/Orkash> <http://www.orkash.com/blog/> 
<http://www.orkash.com>
<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidential 
business communication. If you are not the intended recipient it may be 
unlawful for you to read, copy, distribute, disclose or otherwise use 
the information in this e-mail. If you have received it in error or are 
not the intended recipient, please destroy it and notify the sender 
immediately. Thank you /


Mime
View raw message