hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject RE: scanner deadlock?
Date Wed, 14 Sep 2011 06:40:02 GMT
As expected, J-D's suggestion basically causes the system to collapse almost immediately. All
the region servers show stack traces such as these:

2011-09-13 23:34:43,151 WARN  [IPC Reader 0 on port 60020] ipc.HBaseServer$Listener(526):
IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection
reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.read(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExe


2011-09-13 23:34:43,293 WARN  [IPC Server handler 0 on 60020] ipc.HBaseServer$Handler(1100):
IPC Server handler 0 on 60020 caught: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1341)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

-----Original Message-----
From: Geoff Hendrey [mailto:ghendrey@decarta.com] 
Sent: Tuesday, September 13, 2011 10:56 PM
To: user@hbase.apache.org
Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
Subject: RE: scanner deadlock?

answers below

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, September 13, 2011 10:41 PM
To: user@hbase.apache.org
Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
Subject: Re: scanner deadlock?

On Tue, Sep 13, 2011 at 10:25 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> I've upgraded to HotSpot 64 Bit Server VM, with HBase 90.4 and all recommended config
changes (100 region server handlers, mslab enabled, etc). No change, if anything it dies faster.
Count of sockets in CLOSE_WAIT on 50010 increases linearly. I logged netstat from a random
node in the cluster, periodically. Then dumped the output into excel using a pivot table to
look at a behavior of TCP. Number of connections from the given node to others on 50010 was
relatively uniform (no hotspot). Connections on 50010 from given node to *self* was much way
higher than to other nodes, but that's probably a good thing. My guess is it's HBase leveraging
locality of files for the region server. Just a guess.
>
Yes.  You have good locality.  So maybe you are not bound up on a
single network resource.

So when you jstack and you see that regionserver has its threads all
stuck in next -- are they? -- then we are likely going to the local
datanode.  

ANSWER: yes, I believe they are stuck in next. I can see from logs on the client that the
call to next periodically takes very long to return. You had earlier commented that my 5 seconds
regionserver least was low. I had set it low intentionally to get a fail-fast on the call
to next. If I don't set it low, it will just take whatever the timeout is to fail (60 seconds
default). 


Anything in its logs when regionserver slows down?

ANSWER: Yes. I see ScannerTimeoutException, and unknown scanner, and then ClosedChannelException.
Stack trace shows the ClosedChannelException occurs when the server tries to write the response
to the scanner. This seems like a bug to me. Once you close the channel you cannot write it,
no? If you try to write it after you close it, you will get ClosedChannelException.


> next step will be to test with JD Cryans suggestion:
> " In order to completely rule out at least one thing, can you set ipc.server.max.queue.size
to 1 and hbase.regionserver.handler.count to a low number (let's say 10)? If payload is putting
too much memory pressure, we'll know."
>
> ...though I'm not sure what I'm supposed to observe with these settings...but I'll try
it and report on the outcome.
>

Well, you have GC logging already.  If you look at the output do you
see big pauses? 

ANSWER: Nope, no long pauses at all. I've periodically run a few greps with a regex to try
to find pauses one second or longer, and I haven't seen any of late. HOWEVER, one thing I
don't understand at all is why ganglia reports HUGE gc pauses (like 1000 seconds!). But in
reality I can *never* find such a pause in the GC log. Is there any known issue with ganglia
graphs being on the wrong vertical scale for GC pauses? I know that sounds odd, but I just
can't correlate the Ganglia graphs for GC to reality.

 I think J-D was thinking that regionservers would be
using less memory if you make the queues smaller.  You could try that.
 Maybe when queues are big, its taking a while to process them and
client times out.  What size are these rows?

ANSWER: rows are about 100KB-750KB. Trying J-D's suggestion now.

St.Ack


> -geoff
>
> -----Original Message-----
> From: Geoff Hendrey [mailto:ghendrey@decarta.com]
> Sent: Tuesday, September 13, 2011 4:50 PM
> To: user@hbase.apache.org; Andrew Purtell
> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
> Subject: RE: scanner deadlock?
>
> 1019 sockets on 50010 in CLOSED_WAIT state.
>
> -geoff
>
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Tuesday, September 13, 2011 4:00 PM
> To: user@hbase.apache.org
> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
> Subject: Re: scanner deadlock?
>
>
>
>> My current working theory is that
>> too many sockets are in CLOSE_WAIT state (leading to
>> ClosedChannelException?). We're going to try to adjust some OS
>> parameters.
>
> How many sockets are in that state? netstat -an | grep CLOSE_WAIT | wc -l
>
> CDH3U1 contains HDFS-1836... https://issues.apache.org/jira/browse/HDFS-1836
>
> Best regards,
>
>        - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>
>
>>________________________________
>>From: Geoff Hendrey <ghendrey@decarta.com>
>>To: user@hbase.apache.org
>>Cc: Tony Wang <twang@decarta.com>; Rohit Nigam <rnigam@decarta.com>; Parmod
Mehta <pmehta@decarta.com>; James Ladd <jladd@decarta.com>
>>Sent: Tuesday, September 13, 2011 9:49 AM
>>Subject: RE: scanner deadlock?
>>
>>Thanks Stack -
>>
>>Answers to all your questions below. My current working theory is that
>>too many sockets are in CLOSE_WAIT state (leading to
>>ClosedChannelException?). We're going to try to adjust some OS
>>parameters.
>>
>>" I'm asking if regionservers are bottlenecking on a single network
>>resource; a particular datanode, dns?"
>>
>>Gotcha. I'm gathering some tools now to collect and analyze netstat
>>output.
>>
>>" the regionserver is going slow getting data out of
>>hdfs.  Whats iowait like at the time of slowness?  Has it changed from
>>when all was running nicely?"
>>
>>iowait is high (20% above cpu), but not increasing. I'll try to quantify
>>that better.
>>
>>" You talk to hbase in the reducer?   Reducers don't start writing hbase
>>until job is 66% complete IIRC.    Perhaps its slowing as soon as it
>>starts writing hbase?  Is that so?"
>>
>>My statement about "running fine" applies to after the reducer has
>>completed sort. We have metrics produced by the reducer that log the
>>results of scans ant Puts. So we know that scans and puts proceed
>>without issue for hours.
>>
>

Mime
View raw message