accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dlmar...@comcast.net>
Subject RE: Tserver's strange state.
Date Fri, 23 Oct 2015 00:24:09 GMT
The errors in the logs regarding 10.2.130.1, was this during the time it was rebooted, or before?
Was the tserver on 10.2.130.1 hung? I think we would likely need more information from the
logs to determine what is occurring during this time. It might be best to open an issue in
JIRA for this.

> -----Original Message-----
> From: webmaster@webmaster.ms [mailto:webmaster@webmaster.ms] On
> Behalf Of Denis
> Sent: Thursday, October 22, 2015 7:29 PM
> To: user@accumulo.apache.org
> Subject: Re: Tserver's strange state.
> 
> The server 10.2.130.1 has been rebooted.
> Yes, it is a production system with a lot of reads and writes.
> 
> On 10/22/15, dlmarion <dlmarion@comcast.net> wrote:
> >
> >
> > Are you trying to shut the whole system down, or just a couple of
> > tablet servers?Is your application reading and writing from/to
> > Accumulo during this time?
> >
> >
> >
> >
> > -------- Original message --------
> > From: Denis <denis@camfex.cz>
> > Date: 10/22/2015  6:03 PM  (GMT-05:00)
> > To: user@accumulo.apache.org
> > Subject: Re: Tserver's strange state.
> >
> > Both servers has the errors in the logs like these:
> >
> > ========
> > 2015-10-22 03:28:00,599 ERROR
> > org.apache.accumulo.core.client.impl.Writer: error sending update to
> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
> > for channel to be ready for re ad. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/10.2.142.1:36148 remote=/10.2.130.1:9997]
> > 2015-10-22 03:28:04,283 ERROR
> > org.apache.accumulo.core.client.impl.Writer: error sending update to
> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
> > for channel to be ready for re ad. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/10.2.142.1:37047 remote=/10.2.130.1:9997]
> > 2015-10-22 03:28:06,116 ERROR
> > org.apache.accumulo.core.client.impl.Writer: error sending update to
> > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
> > java.net.SocketTimeoutException: 120000 millis timeout while waiting
> > for channel to be ready for re ad. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/10.2.142.1:37167 remote=/10.2.130.1:9997] ========
> >
> > On 10/22/15, Denis <denis@camfex.cz> wrote:
> >> Hi
> >>
> >> Sometimes my Tablet Servers go into a strange state: they have some
> >> very old scans (see picture: http://i.imgur.com/2sOUM99.png) and
> >> being in this state they cannot be decomissioned gracefully using
> >> "accumulo stop" - number of their tablets decreases down to some
> >> fixed number (say from 6K tablets to 2K), not to zero.
> >> It is diffucult to reproduce.
> >> Now I have a live system with 2 tabletservers in this state.
> >> Any suggestions how to catch the bug?
> >>
> >


Mime
View raw message