accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis <de...@camfex.cz>
Subject Re: Tserver's strange state.
Date Thu, 22 Oct 2015 23:28:43 GMT
The server 10.2.130.1 has been rebooted.
Yes, it is a production system with a lot of reads and writes.

On 10/22/15, dlmarion <dlmarion@comcast.net> wrote:
>
>
> Are you trying to shut the whole system down, or just a couple of tablet
> servers?Is your application reading and writing from/to Accumulo during this
> time?
>
>
>
>
> -------- Original message --------
> From: Denis <denis@camfex.cz>
> Date: 10/22/2015  6:03 PM  (GMT-05:00)
> To: user@accumulo.apache.org
> Subject: Re: Tserver's strange state.
>
> Both servers has the errors in the logs like these:
>
> ========
> 2015-10-22 03:28:00,599 ERROR
> org.apache.accumulo.core.client.impl.Writer: error sending update to
> 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: 120000 millis timeout while waiting
> for channel to be ready for re
> ad. ch : java.nio.channels.SocketChannel[connected
> local=/10.2.142.1:36148 remote=/10.2.130.1:9997]
> 2015-10-22 03:28:04,283 ERROR
> org.apache.accumulo.core.client.impl.Writer: error sending update to
> 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: 120000 millis timeout while waiting
> for channel to be ready for re
> ad. ch : java.nio.channels.SocketChannel[connected
> local=/10.2.142.1:37047 remote=/10.2.130.1:9997]
> 2015-10-22 03:28:06,116 ERROR
> org.apache.accumulo.core.client.impl.Writer: error sending update to
> 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: 120000 millis timeout while waiting
> for channel to be ready for re
> ad. ch : java.nio.channels.SocketChannel[connected
> local=/10.2.142.1:37167 remote=/10.2.130.1:9997]
> ========
>
> On 10/22/15, Denis <denis@camfex.cz> wrote:
>> Hi
>>
>> Sometimes my Tablet Servers go into a strange state: they have some
>> very old scans (see picture: http://i.imgur.com/2sOUM99.png) and being
>> in this state they cannot be decomissioned gracefully using "accumulo
>> stop" - number of their tablets decreases down to some fixed number
>> (say from 6K tablets to 2K), not to zero.
>> It is diffucult to reproduce.
>> Now I have a live system with 2 tabletservers in this state.
>> Any suggestions how to catch the bug?
>>
>

Mime
View raw message