hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Schaback <johannes.schab...@visual-meta.com>
Subject Re: single RegionServer stuck, causing cluster to hang
Date Sun, 24 Aug 2014 04:25:50 GMT
We us all plain gets and puts (sometimes batched).

We have hbase.client.keyvalue.maxsize increased to 536870912 bytes on the
client. That is the only thing I can see.

I am about to send you a zip file with the respective classes to your email
address directly. I probably better dont post the code publicly.

We will also attempt to set hbase.ipc.server.callqueue.handler.factor to 0
now. I keep you posted.

Johannes




On Sun, Aug 24, 2014 at 1:06 AM, Stack <stack@duboce.net> wrote:

> I am having trouble reproducing the stack overflow. Some particular
> response is triggering it (the code here has been around a while).  Any
> particulars on how your client is accessing hbase? Anything unusual?
>
> If you were looking for something to try, set
> hbase.ipc.server.callqueue.handler.factor
> to 0.  Multiple queues is what is new here. It should not make a difference
> but...
>
> St.Ack
>
>
>
>
>
> On Sat, Aug 23, 2014 at 1:23 PM, Johannes Schaback <
> johannes.schaback@visual-meta.com> wrote:
>
> > Thank you.
> >
> > From the proposed resolution I imagine that the RS would then die in case
> > of a handler error. So the question remains what error originally occured
> > in the handler in the first place. The log of the entire lifecycle of the
> > RS (http://schabby.de/wp-content/uploads/2014/08/filtered.txt) does not
> > reveal much to me unfortunately. Do you find anything in there that hints
> > to something that may cause the handler to end up in the soon-to-be-fixed
> > recursion?
> >
> > @Ted, the line "at
> > org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)" is all I
> can
> > see unfortunately :(
> >
> >
> >
> > On Sat, Aug 23, 2014 at 9:43 PM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > On Sat, Aug 23, 2014 at 12:11 PM, Johannes Schaback <
> > > johannes.schaback@visual-meta.com> wrote:
> > >
> > > > Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > >         at
> > org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> > > >         at
> > org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> > > >         at
> > org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> > > >         at
> > org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> > > >         (and so on...)
> > > > ​
> > > >
> > > ​
> > >
> > > ​That is the anonymous CellScanner instance we create from
> > > CellUtil#createCellScanner. See
> > > https://issues.apache.org/jira/browse/HBASE-11813
> > > ​
> > >
> > > > ​Filtering the .out file for "Exception" shows that several handlers
> > > > crashed
> > > > ​​
> > > > like that:
> > > >
> > > > Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> > > > java.lang.StackOverflowError
> > > > Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> > > > java.lang.StackOverflowError
> > > >
> > > ​​
> > > ​
> > > We should fix this so the RegionServer aborts if it loses a handler to
> an
> > > Error.
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
> >
> >
> > --
> > LadenZeile.de <http://www.ladenzeile.de/>
> > powered by Visual Meta GmbH - www.visual-meta.com
> >
> > Tel.: +49 30 / 609 84 88 20
> > Fax: +49 30 / 609 84 88 21
> > E-Mail: johannes.schaback@visual-meta.com
> >
> > Visual Meta GmbH, Schützenstraße 25, 10117 Berlin
> > Geschäftsführer: Robert M. Maier, Johannes Schaback
> > Handelsregister HRB 115795 B, Amtsgericht Charlottenburg
> > USt-IdNr.: DE263760203
> >
>



-- 
LadenZeile.de <http://www.ladenzeile.de/>
powered by Visual Meta GmbH - www.visual-meta.com

Tel.: +49 30 / 609 84 88 20
Fax: +49 30 / 609 84 88 21
E-Mail: johannes.schaback@visual-meta.com

Visual Meta GmbH, Schützenstraße 25, 10117 Berlin
Geschäftsführer: Robert M. Maier, Johannes Schaback
Handelsregister HRB 115795 B, Amtsgericht Charlottenburg
USt-IdNr.: DE263760203

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message