hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhou Shuaifeng <zhoushuaif...@huawei.com>
Subject Re: IPC Server Responder out put error causing RegionServer down
Date Mon, 31 Jan 2011 10:49:45 GMT
2011-01-28 17:13:47,633

java.net.SocketTimeoutException: Call to c3s4.site/167.6.5.13:60020 failed
on socket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/167.6.5.14:50735
remote=c3s4.site/167.6.5.13:60020]

2011-01-28 17:13:47,634 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed open of
daughter

 

According this “60000 millis timeout”, I checked the log at 17:12:47,
there is a null point exception and ClosedChannelException, is there any
relation?:

 

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@3a3e8cfb) from 167.6.5.17:51035: output
error

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@7a0a4ce5) from 167.6.5.14:50307: output
error

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 2 on 60020 caught: java.nio.channels.ClosedChannelException

                at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

                at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

                at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServe
r.java:724)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java
:789)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)

 

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 4 on 60020 caught: java.nio.channels.ClosedChannelException

                at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

                at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

                at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServe
r.java:724)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java
:789)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)

 

2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 5 on 60020 caught: java.nio.channels.ClosedChannelException

                at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)

                at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

                at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServe
r.java:724)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java
:789)

                at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)

 

2011-01-28 17:12:47,097 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 78 byte(s) of data from znode
/hbase/unassigned/1489005343;
data=region=ufdr_c,201101258613813763326#174006,1295973611272,
server=c3s1.site:60000, state=M_ZK_REGION_OFFLINE

2011-01-28 17:12:47,097 ERROR org.apache.hadoop.hbase.executor.EventHandler:
Caught throwable while processing event M_RS_OPEN_REGION

java.lang.NullPointerException

                at
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)

                at
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransi
tionData.java:198)

                at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)

                at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.ja
va:552)

                at
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.ja
va:545)

                at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZoo
keeperOfflineToOpening(OpenRegionHandler.java:296)

                at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
egionHandler.java:90)

                at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

                at java.lang.Thread.run(Thread.java:662)

 

 

Zhou Shuaifeng(Frank)
HUAWEI TECHNOLOGIES CO.,LTD. huawei_logo



----------------------------------------------------------------------------
---------------------------------------------------------
This e-mail and its attachments contain confidential information from
HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any
use of the 
information contained herein in any way (including, but not limited to,
total or partial 
disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by 
phone or email immediately and delete it!

发件人: Zhou Shuaifeng [mailto:zhoushuaifeng@huawei.com] 
发送时间: 2011年1月31日 17:44
收件人: dev@hbase.apache.org
抄送: yanlijun@huawei.com; wei.zhi@huawei.com
主题: IPC Server Responder out put error causing RegionServer down

 

Hi, 

 

when I update my hbase from 0.20.6 to 0.90, some error occur:

Below is logs on 167.6.5.13, because Meta table is served by this server, so
other regionservers want to connect this server to getRegionInfo, but got
some error:

 

2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@15e43aec) from 167.6.5.11:34887: output
error

2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRegionInfo([B@3ad8672e) from 167.6.5.14:50735: output
error

 

This caused the server on 11 and 14  socket timeout and shutdown:

 

2011-01-28 17:13:47,633 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=c3s5.site,60020,1296203449919, load=(requests=0, regions=2710,
usedHeap=4622, maxHeap=8165): Failed open of daughter
ufdr_c,201101268613812388592#085317,1296205930314.cd36b19931bd2b2adcca176a88
dbd6e5.

java.net.SocketTimeoutException: Call to c3s4.site/167.6.5.13:60020 failed
on socket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/167.6.5.14:50735
remote=c3s4.site/167.6.5.13:60020]

2011-01-28 17:13:47,634 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed open of
daughter

 

Now, my question is what’s the cause of “output error” ?

 

Before the “output error” warning, there is an EOFException, is there any
relationship with this error?

 

2011-01-28 17:13:43,413 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: readAndProcess threw exception java.io.EOFException.
Count of bytes read: 0

java.io.EOFException

       at java.io.DataInputStream.readFully(DataInputStream.java:180)

       at java.io.DataInputStream.readUTF(DataInputStream.java:592)

       at java.io.DataInputStream.readUTF(DataInputStream.java:547)

       at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:122
)

       at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.j
ava:975)

       at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServe
r.java:943)

       at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:519
)

       at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java
:313)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

       at java.lang.Thread.run(Thread.java:662)

2011-01-28 17:13:43,413 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: ufdr_c,201101258613814756901#232839,1295985296267.2025697489

2011-01-28 17:13:43,418 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
/hbase/unassigned/1489005343;
data=region=ufdr_c,201101258613813763326#174006,1295973611272,
server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING

2011-01-28 17:13:43,420 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
/hbase/unassigned/1497320702;
data=region=ufdr_c,201101268613810041680#085457,1296052066413,
server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING

2011-01-28 17:13:43,441 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
regionserver:60020-0x22dcbbf01b40006 Received ZooKeeper Event,
type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/1489005343

2011-01-28 17:13:43,441 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22dcbbf01b40006 Successfully transitioned node
1489005343 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING

 

 

 

Zhou Shuaifeng(Frank)




Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message