accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mohit.kaushik" <mohit.kaus...@orkash.com>
Subject Re: Mutation Rejected exception with server Error 1
Date Wed, 23 Dec 2015 04:59:51 GMT
And why are there 5000 spans queued for delevery?

*Tracing spans are being dropped because there are already 5000 spans queued for delivery.
This does not affect performance, security or data integrity, but distributed tracing information
is being lost.*


On 12/23/2015 10:01 AM, mohit.kaushik wrote:
>
> I have 3 tablet servers having around 1.4K tablets. If a tablet server 
> loses its session with zookeeper and killed itself. The system takes 
> some time to move all hosted tablets to other servers.
>
> In this case if a ingest in process then what should happen with the 
> mutations going to tablets hosted by that tablet server?
> Is it the reason for the first exception?Should they not be redirected 
> to other servers?
> nd I had set the system swappiness to 1. Should I keep it 0 in this 
> case? I will check further.
>
> Thanks for the reply
>
> -Mohit Kaushik
>
> On 12/22/2015 08:17 PM, Eric Newton wrote:
>> A tablet server is given the rights to manage a tablet.
>>
>> It is critical that no other server uses the tablet to maintain 
>> consistency.
>>
>> To maintain the right to access a tablet, it must maintain a 
>> zookeeper session. The zookeeper session periodically exchanges 
>> keep-alive messages. If either party fails to get a keep-alive, 
>> zookeeper will close the connection. The client can attempt to 
>> reconnect, but if it fails to do so, the session will timeout.
>>
>> If the tablet server loses its session with zookeeper, the rest of 
>> the system can take over its tablets.
>>
>> When a tablet detects that it lost its zookeeper session, it kills 
>> itself to avoid doing anything with the tablets it no long has the 
>> right to host.
>>
>> What you are seeing here is the first step in that process, and it is 
>> probably due to the tablet server not sending a keep-alive message to 
>> zookeeper in time.
>>
>> There are many reasons for a tablet server to be delayed in sending a 
>> keep-alive message. By far the most common is that your system is 
>> over-subscribed for memory, and part of the tablet server's memory 
>> swapped out. Once the java garbage collection cycle swapped it back 
>> in, there was a considerable delay.
>>
>> However, there can be other things going on.  This is just a best 
>> guess.  Monitor swap usage, as a first diagnostic step.
>>
>> -Eric
>>
>>
>>
>> On Tue, Dec 22, 2015 at 8:30 AM, mohit.kaushik 
>> <mohit.kaushik@orkash.com <mailto:mohit.kaushik@orkash.com>> wrote:
>>
>>     Dear All,
>>
>>     The mutations rejected exception can be seen at client side with
>>     server error 1.
>>     /*org.apache.accumulo.core.client.MutationsRejectedException: #
>>     constraint violations : 0  security codes: {}  # server errors 1
>>     # exceptions 1\n\tat
>>     org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:537)\n\tat
>>     org.apache.accumulo.core.client.impl.TabletServerBatchWriter.addMutation(TabletServerBatchWriter.java:249)\n\tat
>>     org.apache.accumulo.core.client.impl.MultiTableBatchWriterImpl$TableBatchWriter.addMutation(MultiTableBatchWriterImpl.java:64)\n\tat
>>     com.orkash.accumulo.IngestionWithoutServiceOnCondition.main(IngestionWithoutServiceOnCondition.java:235)\n\tat
>>     com.orkash.db.DBQuery.insertLookUpDB(DBQuery.java:570)\n\tat
>>     com.orkash.Crawling.CrawlerThread.run(CrawlerThread.java:145)\n\tat
>>     java.lang.Thread.run(Thread.java:745)\nCaused by:
>>     org.apache.accumulo.core.client.impl.AccumuloServerException:
>>     Error on server orkash1:9997\n\tat */
>>
>>     I also found exceptions in Monitor related to Tracing.
>>
>>     *Tracing spans are being dropped because there are already 5000 spans queued
for delivery.
>>     This does not affect performance, security or data integrity, but distributed
tracing information is being lost.**
>>     **
>>     **and**6458 times**
>>     **Got an IOException in internalRead!
>>     	java.io.IOException: Connection reset by peer
>>     		at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>     		at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>     		at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>     		at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>     		at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>     		at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>     		at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
>>     		at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
>>     		at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
>>     		at org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
>>     		at org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run*
>>
>>
>>
>>     I am facing the following exceptions in tserver logs and one
>>     tserver goes dead.
>>
>>     *2015-12-22 09:37:27,173 [zookeeper.ZooCache] WARN : Saw
>>     (possibly) transient exception communicating with ZooKeeper, will
>>     retry**
>>     **org.apache.zookeeper.KeeperException$ConnectionLossException:
>>     KeeperErrorCode = ConnectionLoss for
>>     /accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/tables/16/conf/table.split.threshold**
>>     **        at
>>     org.apache.zookeeper.KeeperException.create(KeeperException.java:99)**
>>     **        at
>>     org.apache.zookeeper.KeeperException.create(KeeperException.java:51)**
>>     **        at
>>     org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)**
>>     **        at
>>     org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264)**
>>     **        at
>>     org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)**
>>     **        at
>>     org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)**
>>     **        at
>>     org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)**
>>     **        at
>>     org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:117)**
>>     **        at
>>     org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:103)**
>>     **        at
>>     org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:99)**
>>     **        at
>>     org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:197)**
>>     **        at
>>     org.apache.accumulo.tserver.tablet.Tablet.findSplitRow(Tablet.java:1604)**
>>     **        at
>>     org.apache.accumulo.tserver.tablet.Tablet.needsSplit(Tablet.java:1772)**
>>     **        at
>>     org.apache.accumulo.tserver.TabletServer$MajorCompactor.run(TabletServer.java:1853)**
>>     **        at
>>     org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)**
>>     **        at java.lang.Thread.run(Thread.java:745)**
>>     *
>>     These are creating problems in continuously ingesting data and I
>>     also experienced some delay in queries and table create commands.
>>     Please comment what could be the cause of these exceptions?
>>
>>     Thanks
>>     Mohit Kaushik
>>
>>     **
>>
>>


Mime
View raw message