hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 吴国泉 <wgq...@qunar.com>
Subject Re: How to optimize the GC in HBase
Date Mon, 30 Nov 2015 10:42:53 GMT
hi:
     I also met this problem,    cong,pu is my partner,
     hbase regionserver is configured with 92G RAM,
     sometimes the regionserver would crash,i checked the log, but there 
was no GC happened ,that is very strange,
     here is part of the regionserver log:



2015-11-29 00:48:30,521 DEBUG 
[regionserver60020-smallCompactions-1447234955800] 
compactions.Compactor: Compacting 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,

keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher] 
util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to 
a long garbage collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,912 INFO 
[SplitLogWorker-l-hbase27.dba.cn2.qunar.com,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]

zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket 
connection and attempting reconnect
2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker] 
util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to 
a long garbage collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,912 INFO 
[regionserver60020-SendThread(l-zk1.dba.cn6.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 74953ms for sessionid 0x34f16f641cc2e77, closing socket 
connection and attempting reconnect
2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: 
Detected pause in JVM or host machine (eg GC): pause of approximately 
70587ms
No GCs detected
2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept 
72658ms instead of 3000ms, this is likely due to a long garbage 
collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,913 INFO 
[regionserver60020-SendThread(l-zk4.dba.cn8.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 81433ms for sessionid 0x54f16f4aa3363fa, closing socket 
connection and attempting reconnect
2015-11-29 00:49:46,913 INFO 
[regionserver60020-SendThread(l-zk2.dba.cn6.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 79798ms for sessionid 0x54f16f4aa3363fe, closing socket 
connection and attempting reconnect
2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549] 
hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms 
(threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR 
downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 
10.90.18.60:50010, 10.90.18.63:50010]
2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] 
hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
java.io.EOFException: Premature EOF: no length prefix available
         at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
         at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)


      as we can see , the log shows that :"We slept 74029ms instead of 
10000ms, this is likely due to a long garbage collecting pause ",but 
following :" it No GCs deteted ",
      this confused me. Can you give me some advice?
      I use   java7, G1 garbage collector,


On 2015年11月29日 08:44, 聪聪 wrote:
> hi,all:
> This morning regionserver shutdown.By analyzing the log, I guess it's about GC.Then,How
to optimize the GC in HBase.
> I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in this.
>
>
> JVM configuration is as follows:
> export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m -XX:MaxPermSize=256m
-XX:+UseG1GC -server -XX:+DisableExplicitGC -XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0
-XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=40 -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled
-XX:-ResizePLAB -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC -Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log"
>
>
> Regionserver on the log is as follows:
> 2015-11-29 00:48:30,521 DEBUG [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
Compacting hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/2becc4de83bc4102802a4f69adfba433,
keycount=1244806, bloomtype=ROW, size=462.2 M, encoding=NONE, seqNum=22878
> 2015-11-29 00:48:30,521 DEBUG [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
Compacting hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/52669d6cf3a34dc89af6faf23ea03fa0,
keycount=613700, bloomtype=ROW, size=236.4 M, encoding=NONE, seqNum=22885
> 2015-11-29 00:48:30,521 DEBUG [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
Compacting hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/1ff32b168134412ea3f6942126623ef6,
keycount=373741, bloomtype=ROW, size=143.9 M, encoding=NONE, seqNum=22889
> 2015-11-29 00:48:30,521 DEBUG [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
Compacting hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,
keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
> 2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept
71814ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's
usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,912 INFO  [SplitLogWorker-l-hbase27.dba.cn2.qunar.com,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 77297ms for
sessionid 0x150ad3cf038d6e6, closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker] util.Sleeper: We
slept 74029ms instead of 10000ms, this is likely due to a long garbage collecting pause and
it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,912 INFO  [regionserver60020-SendThread(l-zk1.dba.cn6.qunar.com:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 74953ms for
sessionid 0x34f16f641cc2e77, closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause
in JVM or host machine (eg GC): pause of approximately 70587ms
> No GCs detected
> 2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept 72658ms instead
of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(l-zk4.dba.cn8.qunar.com:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 81433ms for
sessionid 0x54f16f4aa3363fa, closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 79798ms for
sessionid 0x54f16f4aa3363fe, closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549]
hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms (threshold=30000ms); ack: seqno:
-2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 10.90.18.60:50010,
10.90.18.63:50010]
> 2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
> java.io.EOFException: Premature EOF: no length prefix available
>          at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
>          at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
>          at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)
> 2015-11-29 00:49:46,914 DEBUG [LruStats #0] hfile.LruBlockCache: Total=20.98 MB, free=18.38
GB, max=18.40 GB, blocks=19756849152, accesses=564477317, hits=166122280, hitRatio=29.43%,
, cachingAccesses=166131865, cachingHits=166015128, cachingHitsRatio=99.93%, evictions=149453,
evicted=116562, evictedPerRun=0.7799240946769714
> 2015-11-29 00:49:46,929 WARN  [DataStreamer for file /hbase/tc_qtrace/WALs/l-hbase27.dba.cn2.qunar.com,60020,1447234715888/l-hbase27.dba.cn2.qunar.com%2C60020%2C1447234715888.1448726130837.meta
block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] hdfs.DFSClient: Error
Recovery for block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 in pipeline
10.90.18.47:50010, 10.90.18.61:50010, 10.90.18.53:50010: bad datanode 10.90.18.47:50010
>
>
> 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向
secteam发送邮件

-- 
*吴国泉 wgq.wu*
----------------------------------------------------
*Position*:技术部DBA-HBase
*Email*:wgq.wu@qunar.com
*Mob*:13036134151
*Location*:北京市海淀区丹棱街3号中国电子大厦 1708
*Post Code*:100080

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message