accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis <de...@camfex.cz>
Subject Re: Tserver's strange state.
Date Sat, 07 Nov 2015 12:18:26 GMT
I did not manage to investigate and fix it yet.
Today the same problem hit again: one machine with HDFS datanode was
rebooted and 3 Accumulo TServers had the oldest scan correlated with
the time of datanode restart.


On 10/23/15, dlmarion@comcast.net <dlmarion@comcast.net> wrote:
> The errors in the logs regarding 10.2.130.1, was this during the time it was
> rebooted, or before? Was the tserver on 10.2.130.1 hung? I think we would
> likely need more information from the logs to determine what is occurring
> during this time. It might be best to open an issue in JIRA for this.
>
>> -----Original Message-----
>> From: webmaster@webmaster.ms [mailto:webmaster@webmaster.ms] On
>> Behalf Of Denis
>> Sent: Thursday, October 22, 2015 7:29 PM
>> To: user@accumulo.apache.org
>> Subject: Re: Tserver's strange state.
>>
>> The server 10.2.130.1 has been rebooted.
>> Yes, it is a production system with a lot of reads and writes.
>>
>> On 10/22/15, dlmarion <dlmarion@comcast.net> wrote:
>> >
>> >
>> > Are you trying to shut the whole system down, or just a couple of
>> > tablet servers?Is your application reading and writing from/to
>> > Accumulo during this time?
>> >
>> >
>> >
>> >
>> > -------- Original message --------
>> > From: Denis <denis@camfex.cz>
>> > Date: 10/22/2015  6:03 PM  (GMT-05:00)
>> > To: user@accumulo.apache.org
>> > Subject: Re: Tserver's strange state.
>> >
>> > Both servers has the errors in the logs like these:
>> >
>> > ========
>> > 2015-10-22 03:28:00,599 ERROR
>> > org.apache.accumulo.core.client.impl.Writer: error sending update to
>> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
>> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
>> > for channel to be ready for re ad. ch :
>> > java.nio.channels.SocketChannel[connected
>> > local=/10.2.142.1:36148 remote=/10.2.130.1:9997]
>> > 2015-10-22 03:28:04,283 ERROR
>> > org.apache.accumulo.core.client.impl.Writer: error sending update to
>> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
>> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
>> > for channel to be ready for re ad. ch :
>> > java.nio.channels.SocketChannel[connected
>> > local=/10.2.142.1:37047 remote=/10.2.130.1:9997]
>> > 2015-10-22 03:28:06,116 ERROR
>> > org.apache.accumulo.core.client.impl.Writer: error sending update to
>> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
>> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
>> > for channel to be ready for re ad. ch :
>> > java.nio.channels.SocketChannel[connected
>> > local=/10.2.142.1:37167 remote=/10.2.130.1:9997] ========
>> >
>> > On 10/22/15, Denis <denis@camfex.cz> wrote:
>> >> Hi
>> >>
>> >> Sometimes my Tablet Servers go into a strange state: they have some
>> >> very old scans (see picture: http://i.imgur.com/2sOUM99.png) and
>> >> being in this state they cannot be decomissioned gracefully using
>> >> "accumulo stop" - number of their tablets decreases down to some
>> >> fixed number (say from 6K tablets to 2K), not to zero.
>> >> It is diffucult to reproduce.
>> >> Now I have a live system with 2 tabletservers in this state.
>> >> Any suggestions how to catch the bug?
>> >>
>> >
>
>

Mime
View raw message