hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lu, Wei" <...@microstrategy.com>
Subject region server down when scanning using mapreduce
Date Tue, 12 Mar 2013 05:06:01 GMT
Hi, 

When we use mapreduce to dump data from a pretty large table on hbase. One region server crash
and then another. Mapreduce is deployed together with hbase.

1) From log of the region server, there are both "next" and "multi" operations on going. Is
it because there is write/read conflict that cause scanner timeout?
2) Region server has 24 cores, and # max map tasks is 24 too; the table has about 30 regions
(each of size 0.5G) on the region server, is it because cpu is all used by mapreduce and that
case region server slow and then timeout?
2) current hbase.regionserver.handler.count is 10 by default, should it be enlarged?

Please give us some advices.

Thanks,
Wei


Log information: 


[Regionserver rs21:]

2013-03-11 18:36:28,148 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
entries=22417, filesize=127539793.  for /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 28183ms instead
of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":29830,"call":"next(1656517918313948447,
1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56058","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31195,"call":"next(-8353194140406556404,
1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56529","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":30965,"call":"next(2623756537510669130,
1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56146","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31023,"call":"next(5293572780165196795,
1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56069","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31160,"call":"next(-4285417329791344278,
1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56586","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.109.21:35342","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.109.20:51698","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027060645,"queuetimems":34763,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,776 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":41327,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@354baf25),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027030411,"queuetimems":4680,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:38:32,361 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10204,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6d86b477),
rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.10:36950","starttimems":1363027102044,"queuetimems":11027,"class":"HRegionServer","responsesize":0,"method":"multi"}

----------------------------------------------------------
[master:]
2013-03-11 18:35:39,386 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated.
Instead, use fs.defaultFS
2013-03-11 18:38:25,892 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=10 regions=477 average=47.7 mostloaded=52 leastloaded=45
2013-03-11 18:39:42,002 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
ephemeral node deleted, processing expiration [rs21,60020,1363010589837]
2013-03-11 18:39:42,007 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Splitting logs for rs21,60020,1363010589837
2013-03-11 18:39:42,024 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog
workers [rs21,60020,1363010589837]
2013-03-11 18:39:42,033 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting
logs in [hdfs://rs26/hbase/.logs/rs21,60020,1363010589837-splitting]
2013-03-11 18:39:42,179 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/hdfs%3A%2F%2Frs26%3A8020%2Fhbase%2F.logs%2Frs21%2C60020%2C1363010589837-splitting%2Frs21%252C60020%252C1363010589837.1363010594599
acquired by rs19,1363010590987


----------------------------------------------------------
[Regionserver rs21:]
2013-03-11 18:40:06,326 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
next(8419833035992682478, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778
from 10.20.127.21:33592: output error
2013-03-11 18:40:06,326 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -1069130278416755239
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -7554305902624086957
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -1817452922125791171
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -7601125239682768076
lease expired
2013-03-11 18:40:06,327 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 82186ms instead
of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5506385887933665130
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8405483593065293761
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8270919548717867130
lease expired
2013-03-11 18:40:06,327 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5350053253744349360
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -2774223416111392810
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5293572780165196795
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2904518513855545553
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 121972859714825295
lease expired
2013-03-11 18:40:06,328 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
next(1316499555392112856, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778
from 10.20.127.21:33552: output error
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 751003440851341708
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 3456313588148401866
lease expired
2013-03-11 18:40:06,328 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -1893617732870830965
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -677968643251998870
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2623756537510669130
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4453586756904422814
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 4513044921501336208
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8419833035992682478
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 6790476379016482048
lease expired
2013-03-11 18:40:06,329 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 1316499555392112856
lease expired
2013-03-11 18:40:06,337 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
next(6790476379016482048, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778
from 10.20.127.21:33603: output error
2013-03-11 18:40:06,485 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have
not heard from server in 86375ms for sessionid 0x13c9789d2289cd1, closing socket connection
and attempting reconnect
2013-03-11 18:40:06,493 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020
caught: java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
	at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
	at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)

2013-03-11 18:40:06,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server Responder: doAsyncWrite
threw exception java.io.IOException: Broken pipe
2013-03-11 18:40:06,517 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020
caught: java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcher.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
	at sun.nio.ch.IOUtil.write(IOUtil.java:40)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
	at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
	at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)




Mime
View raw message