hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Galed Friedmann <galed.friedm...@onavo.com>
Subject Re: Thrift "hang ups" with no apparent reason
Date Wed, 01 Feb 2012 09:00:12 GMT
Hi,
Thanks for replying!

Answers to your questions:
1. I've taken a dump from the HMaster when we felt some timeouts, I hope
that's what you're looking for, attached.
2. The timeout occurs around 10-12 hours after the ZK established the
connection with the Thrift server so it's not immediate. On the Thrift logs
you see that nothing happened and only see the timeouts on the ZK logs.
Actually we hadn't had errors in the last 15 hours nor ZK timeouts for
Thrift but it'll happen again I'm sure..
3. The lease expiration happens all the time, we're using mostly JRuby
scripts and closing the scans when we're done.

Thanks again,
Galed.

On Tue, Jan 31, 2012 at 10:51 PM, Stack <stack@duboce.net> wrote:

> On Mon, Jan 30, 2012 at 6:39 AM, Galed Friedmann
> <galed.friedmann@onavo.com> wrote:
> > Lately we're having weird issues with Thrift, after several hours the
> > Thrift server "hangs" - the scripts that are using it to access HBase get
> > connection timeouts, we're also using Heroku and ruby on rails apps that
> > use Thrift and they simply get stuck. Only when restarting the Thrift
> > process everything goes back to normal.
> >
>
> Can you thread dump the thrift server when its all hung up?
>
> Have you enabled
>
>
> > 2012-01-30 10:52:08,823 INFO org.apache.zookeeper.server.NIOServerCnxn:
> > Established session 0x1352a393d18051e with negotiated timeout 90000 for
> > client /10.217.55.193:35940
> > 2012-01-30 10:52:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer:
> > Expiring session 0x1352a393d18051b, timeout of 90000ms exceeded
> > 2012-01-30 10:52:28,001 INFO
> > org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> > termination for sessionid: 0x1352a393d18051b
>
> ZK is establishing a session w/ 90second timeout and then timing out
> immediately?
>
>
>
>
> > 2012-01-30 10:51:36,382 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > listener on 60020: readAndProcess threw exception java.io.IOException:
> > Connection rese
> > t by peer. Count of bytes read: 0
> > java.io.IOException: Connection reset by peer
> >        at sun.nio.ch.FileDispatcher.read0(Native Method)
> >        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
> >        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
> >        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >        at java.lang.Thread.run(Thread.java:619)
> > 2012-01-30 10:52:24,016 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > -4511393305838866925 lease expired
> > 2012-01-30 10:52:24,016 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > -5818959718437063034 lease expired
> > 2012-01-30 10:52:24,016 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > -1408921590864341720 lease expired
> >
>
> Client went away?  All the lease expireds happen always or just around
> time of the hangup (You are closing scanners when done?)
>
> St.Ack
>

Mime
View raw message