hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: IPC Server Responder out put error causing RegionServer down
Date Mon, 31 Jan 2011 18:22:54 GMT
(moving to the user mailing list, where it belongs)

My educated guess is that you had a GC pause that lasted for more than a
minute while a file was being written to. Even if the write wasn't
happening, your region server would have committed suicide anyways since it
was probably past it's lease timeout. In order to confirm that, we would
need to see the logs from a few minutes before the exception happened. Also,
please use pastebin.com (or similar) to store the logs.

Regarding the NPE, it's definitely not something that should happen but the
region server was already on it's way out.

J-D

2011/1/31 Zhou Shuaifeng <zhoushuaifeng@huawei.com>

> 2011-01-28 17:13:47,633
>
> java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
> *socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/167.6.5.14:*50735*remote=c3s4.site/
> 167.6.5.13:60020]
>
> 2011-01-28 17:13:47,634 INFO org.apache.hadoop.hbase.regionserver.*HRegionServer:
> STOPPED: Failed open of daughter*
>
>
>
> According this “60000 millis timeout”, I checked the log at 17:12:47,
> there is a null point exception and ClosedChannelException, is there any
> relation?:
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@3a3e8cfb) from 167.6.5.17:51035: output
> error
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@7a0a4ce5) from 167.6.5.14:50307: output
> error
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 2 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 4 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 5 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,097 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 78 byte(s) of data from znode
> /hbase/unassigned/1489005343;
> data=region=ufdr_c,201101258613813763326#174006,1295973611272,
> server=c3s1.site:60000, state=M_ZK_REGION_OFFLINE
>
> 2011-01-28 17:12:47,097 ERROR
> org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
> processing event M_RS_OPEN_REGION
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
>
>                 at
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545)
>
>                 at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:296)
>
>                 at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:90)
>
>                 at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>                 at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
> Zhou Shuaifeng(Frank)
> HUAWEI TECHNOLOGIES CO.,LTD. [image: huawei_logo]
>
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> *发件人**:* Zhou Shuaifeng [mailto:zhoushuaifeng@huawei.com]
> *发送时间:* 2011年1月31日 17:44
> *收件人:* dev@hbase.apache.org
> *抄送:* yanlijun@huawei.com; wei.zhi@huawei.com
> *主题:* IPC Server Responder out put error causing RegionServer down
>
>
>
> Hi,
>
>
>
> when I update my hbase from 0.20.6 to 0.90, some error occur:
>
> Below is logs on 167.6.5.13, because Meta table is served by this server,
> so other regionservers want to connect this server to getRegionInfo, but got
> some error:
>
>
>
> 2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@15e43aec) from 167.6.5.11:34887: output
> error
>
> 2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@3ad8672e) from 167.6.5.14:50735: output
> error
>
>
>
> This caused the server on 11 and 14  socket timeout and shutdown:
>
>
>
> 2011-01-28 17:13:47,633 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=c3s5.site,60020,1296203449919, load=(requests=0, regions=2710,
> usedHeap=4622, maxHeap=8165): *Failed open of daughter*ufdr_c,201101268613812388592#085317,1296205930314.cd36b19931bd2b2adcca176a88dbd6e5.
>
> java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
> *socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/167.6.5.14:*50735*remote=c3s4.site/
> 167.6.5.13:60020]
>
> 2011-01-28 17:13:47,634 INFO org.apache.hadoop.hbase.regionserver.*HRegionServer:
> STOPPED: Failed open of daughter*
>
> * *
>
> *Now, my question is what**’**s the cause of **“*output error” ?
>
>
>
> Before the “output error” warning, there is an EOFException, is there any
> relationship with this error?
>
>
>
> 2011-01-28 17:13:43,413 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> listener on 60020: readAndProcess threw exception java.io.EOFException.
> Count of bytes read: 0
>
> java.io.EOFException
>
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>
>        at java.io.DataInputStream.readUTF(DataInputStream.java:592)
>
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:122)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:975)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:943)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:519)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:313)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>        at java.lang.Thread.run(Thread.java:662)
>
> 2011-01-28 17:13:43,413 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
> region: ufdr_c,201101258613814756901#232839,1295985296267.2025697489
>
> 2011-01-28 17:13:43,418 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
> /hbase/unassigned/1489005343;
> data=region=ufdr_c,201101258613813763326#174006,1295973611272,
> server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING
>
> 2011-01-28 17:13:43,420 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
> /hbase/unassigned/1497320702;
> data=region=ufdr_c,201101268613810041680#085457,1296052066413,
> server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING
>
> 2011-01-28 17:13:43,441 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> regionserver:60020-0x22dcbbf01b40006 Received ZooKeeper Event,
> type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/1489005343
>
> 2011-01-28 17:13:43,441 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x22dcbbf01b40006 Successfully transitioned node
> 1489005343 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
>
>
>
>
>
>
>
> Zhou Shuaifeng(Frank)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message