hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: regionserver disconnection
Date Wed, 11 Nov 2009 19:30:38 GMT
The problem seems to be that the IO is saturated. When you see this:

#
2009-11-11 04:28:10,308 WARN
org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 9 on
60021 took 6819ms appending an edit to hlog; editcount=370
#
2009-11-11 04:28:10,308 WARN
org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 2 on
60021 took 4412ms appending an edit to hlog; editcount=371

It's a sign that HDFS is slow. I guess that the zookeeper quorum
members are on those same nodes using the same disks? If so, there's
the problem because Zookeeper needs to be able to commit transactions
very fast and if your disks are already hogged then they have to wait
and you can get disconnections. Do you still have 19 quorum members?
How many disks do you have per machine?

Thx

J-D

On Wed, Nov 11, 2009 at 11:22 AM, Zhenyu Zhong <zhongresearch@gmail.com> wrote:
> Please take a look at the following regionserver log.
> The error started to come out from 4:24am UTC to 4:29am UTC.
>
> http://pastebin.com/m3717d064
>
> Thanks in advance.
>
> Best,
> zhenyu
>
>
> On Wed, Nov 11, 2009 at 2:01 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Could you paste a bigger log trace (interested to see at least all 300
>> lines before that) in a pastebin?
>>
>> Thx,
>>
>> J-D
>>
>> On Wed, Nov 11, 2009 at 10:56 AM, Zhenyu Zhong <zhongresearch@gmail.com>
>> wrote:
>> > J-D,
>> >
>> > Thanks for the reply.
>> >
>> > The problem I have is one or more of the regionservers disconnect from
>> the
>> > cluster.
>> >
>> > I checked the resource usage including memory Heap usage for
>> regionserver,
>> > and found nothing suspicious. The regionserver heap usage is way below
>> the
>> > maxHeap defined in jvm.
>> >
>> > So I checked zookeeper log, then I found the following
>> > BufferUnderflowException, eventually it leads to SocketException:
>> Transport
>> > endpoint is not connected.
>> >
>> >
>> > I also checked the regionserver log. It looks like the regionserver tried
>> to
>> > connect to other zookeeper quorum server. It connects successfully, but
>> then
>> > it complained about SocketException: transport endpoint is not connected.
>> >
>> > So both zookeeper log and regionserver log told me that the transport
>> > endpoint is not connected. I don't know why.
>> >
>> > I would be very appreciated if you could point me to the right direction.
>> >
>> > Best,
>> >
>> >
>> > Regionserver log:
>> > 2009-11-11 04:24:53,718 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> > connection to server superpyxis0018/192.168.100.132:2181
>> > 2009-11-11 04:24:53,718 INFO org.apache.zookeeper.ClientCnxn: Priming
>> > connection to java.nio.channels.SocketChannel[connected local=/
>> > 192.168.100.131:43065 remote=superpyxis0018/192.168.100.132:2181]
>> > 2009-11-11 04:24:53,718 INFO org.apache.zookeeper.ClientCnxn: Server
>> > connection successful
>> > 2009-11-11 04:24:53,768 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
>> > state: Disconnected, type: None, path: null
>> > 2009-11-11 04:24:54,118 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> > connection to server superpyxis0017/192.168.100.131:2181
>> > 2009-11-11 04:24:54,118 INFO org.apache.zookeeper.ClientCnxn: Priming
>> > connection to java.nio.channels.SocketChannel[connected local=/
>> > 192.168.100.131:46903 remote=superpyxis0017/192.168.100.131:2181]
>> > 2009-11-11 04:24:54,118 INFO org.apache.zookeeper.ClientCnxn: Server
>> > connection successful
>> > 2009-11-11 04:24:54,118 WARN org.apache.zookeeper.ClientCnxn: Exception
>> > closing session 0x24dfd90c810002 to sun.nio.ch.SelectionKeyImpl@2dd78ccf
>> > java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
>> > lim=4 cap=4]
>> >        at
>> > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>> >        at
>> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>> > 2009-11-11 04:24:54,118 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>> > exception during shutdown input
>> > java.net.SocketException: Transport endpoint is not connected
>> >        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>> >        at
>> > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>> >        at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>> >        at
>> > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>> >        at
>> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Nov 11, 2009 at 1:40 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> I'm really not sure exactly what the situation is with the
>> >> java.nio.BufferUnderflowException, but what do you mean by
>> >> "regionserver disconnection"? Could you give more details?
>> >>
>> >> Thx,
>> >>
>> >> J-D
>> >>
>> >> On Wed, Nov 11, 2009 at 10:34 AM, Zhenyu Zhong <zhongresearch@gmail.com
>> >
>> >> wrote:
>> >> > I searched around and found HBASE-1645 "HBase could be a little more
>> >> dainty
>> >> > when zk is pulled out from under it"
>> >> >
>> >> > I am wondering whether my problem is related to this HBase-1645. Or
>> >> probably
>> >> > has nothing to do with it?
>> >> >
>> >> > thanks
>> >> >
>> >> >
>> >> > On Wed, Nov 11, 2009 at 12:02 PM, Zhenyu Zhong <
>> zhongresearch@gmail.com
>> >> >wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> I have been monitoring the system resources such as memory Heap
usage
>> >> for
>> >> >> Regionserver, load average, network bandwidth, etc.
>> >> >>
>> >> >> I experienced a disconnection last night. When I checked the memory
>> Heap
>> >> >> for Regionserver, it was 2336 which is way below the max Heap 4096.
I
>> >> don't
>> >> >> think it is Regionserver problem. The requests that the regionserver
>> got
>> >> was
>> >> >> about 126 which doesn't seem to be high. The load average is kind
of
>> >> high
>> >> >> >100. (but I saw disconnections while the load average is around
30).
>> >> The
>> >> >> bandwidth is not a problem.
>> >> >>
>> >> >> However, I found error messages from the zookeeper. Please see
the
>> >> >> following error messages. There are some
>> >> >> BufferUnderflowException error, IOException error,etc.
>> >> >> May I ask what might be the causes that generate these error
>> messages?
>> >> >>
>> >> >> Many thanks
>> >> >> zhenyu
>> >> >>
>> >> >>
>> >> >> 2009-11-11 04:29:00,810 WARN org.apache.zookeeper.server.Request:
>> >> Ignoring
>> >> >> exception during toString
>> >> >> java.nio.BufferUnderflowException
>> >> >>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> >> >>         at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> >> >>         at
>> >> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> >> >>         at java.lang.String.valueOf(String.java:2827)
>> >> >>         at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.quorum.CommitProcessor.processRequest(CommitProcessor.java:167)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:68)
>> >> >> 2009-11-11 04:29:00,810 WARN
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >> >> Exception causing close of session 0x424dfd908c50009
>> >> >> due to java.io.IOException: Read error
>> >> >> 2009-11-11 04:29:00,810 DEBUG
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >> >> IOException stack trace
>> >> >> java.io.IOException: Read error
>> >> >>         at
>> >> >>
>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:372)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>> >> >> 2009-11-11 04:29:00,810 DEBUG
>> >> >> org.apache.zookeeper.server.quorum.CommitProcessor: Processing
>> request::
>> >> >> sessionid:0x424dfd908c50009 type:setWatches cxid:0xfffffffffffffff8
>> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> >> >> 2009-11-11 04:29:00,810 WARN org.apache.zookeeper.server.Request:
>> >> Ignoring
>> >> >> exception during toString
>> >> >> java.nio.BufferUnderflowException
>> >> >>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> >> >>         at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> >> >>         at
>> >> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> >> >>         at java.lang.String.valueOf(String.java:2827)
>> >> >>         at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>> >> >> 2009-11-11 04:29:00,810 DEBUG
>> >> >> org.apache.zookeeper.server.FinalRequestProcessor: Processing
>> request::
>> >> >> sessionid:0x424dfd908c50009 type:setWatches cxid:0xfffffffffffffff8
>> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> >> >> 2009-11-11 04:29:00,810 WARN org.apache.zookeeper.server.Request:
>> >> Ignoring
>> >> >> exception during toString
>> >> >> java.nio.BufferUnderflowException
>> >> >>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> >> >>         at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> >> >>         at
>> >> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> >> >>         at
>> >> >> org.apache.log4j.or.DefaultRenderer.doRender(DefaultRenderer.java:36)
>> >> >>         at
>> >> >> org.apache.log4j.or.RendererMap.findAndRender(RendererMap.java:80)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.log4j.spi.LoggingEvent.getRenderedMessage(LoggingEvent.java:362)
>> >> >> at
>> >> >>
>> >>
>> org.apache.log4j.helpers.PatternParser$BasicPatternConverter.convert(PatternParser.java:403)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>> >> >>         at
>> org.apache.log4j.PatternLayout.format(PatternLayout.java:502)
>> >> >>         at
>> >> >> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359)
>> >> >>         at
>> >> org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
>> >> >>         at
>> >> >> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>> >> >>         at org.apache.log4j.Category.callAppenders(Category.java:206)
>> >> >>         at org.apache.log4j.Category.forcedLog(Category.java:391)
>> >> >>         at org.apache.log4j.Category.debug(Category.java:260)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:147)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>> >> >> 2009-11-11 04:29:00,810 WARN org.apache.zookeeper.server.Request:
>> >> Ignoring
>> >> >> exception during toString
>> >> >> sessionid:0x424dfd908c50009 type:setWatches cxid:0xfffffffffffffff8
>> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> >> >> 2009-11-11 04:29:00,810 INFO
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >> >> closing session:0x424dfd908c50009 NIOServerCnxn:
>> >> >> java.nio.channels.SocketChannel[connected local=/
>> 192.168.100.131:2181
>> >> remote=/
>> >> >> 192.168.100.131:35500]
>> >> >> 2009-11-11 04:29:00,810 DEBUG
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >> >> ignoring exception during input shutdown
>> >> >> java.net.SocketException: Transport endpoint is not connected
>> >> >>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>> >> >>         at
>> >> >>
>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>> >> >>         at
>> >> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>> >> >>         at
>> >> >>
>> org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:848)
>> >> >>         at
>> >> >>
>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:498)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>> >> >>
>> >> >
>> >>
>> >
>>
>

Mime
View raw message