hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 伍照坤 <tonywu...@gmail.com>
Subject Re: How to optimize the GC in HBase
Date Mon, 30 Nov 2015 17:27:20 GMT
i met similar problem before, I suggest you find out the gc log and cpu
usage log.

My experience
#1 see gc too long, cause zookeeper timeout, then region server determine
itself as dead node, and shutdown itself.
#2 optimize the gc, and monitor gc log. Still happen again.
#3 finally found out the reason is region server too busy, cpu usage are
too high, cause all the process slow or blocked.

Hope useful

2015年11月30日星期一,Stack <stack@duboce.net> 写道:

> On Sat, Nov 28, 2015 at 4:44 PM, 聪聪 <175998806@qq.com <javascript:;>>
> wrote:
>
> > hi,all:
> > This morning regionserver shutdown.By analyzing the log, I guess it's
> > about GC.Then,How to optimize the GC in HBase.
> > I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in
> > this.
> >
> >
> > JVM configuration is as follows:
> > export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m
> > -XX:MaxPermSize=256m -XX:+UseG1GC -server -XX:+DisableExplicitGC
> > -XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0
> > -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=40
> > -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB
> > -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> >
> -Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log"
> >
> >
> What the others have said.
>
> How did you arrive at the above configurations?
>
> Looks like you are keeping the gc log in the logs dir at
> gc-HOSTNAME-hbase.log.  Looks like your GC output should be rich with lots
> of helpful detail.
>
> What is your loading like?  All reads? All writes?  Big objects? A mix?
>
> St.Ack
>
>
> >
> > Regionserver on the log is as follows:
> > 2015-11-29 00:48:30,521 DEBUG
> > [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
> > Compacting
> >
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/2becc4de83bc4102802a4f69adfba433,
> > keycount=1244806, bloomtype=ROW, size=462.2 M, encoding=NONE,
> seqNum=22878
> > 2015-11-29 00:48:30,521 DEBUG
> > [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
> > Compacting
> >
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/52669d6cf3a34dc89af6faf23ea03fa0,
> > keycount=613700, bloomtype=ROW, size=236.4 M, encoding=NONE, seqNum=22885
> > 2015-11-29 00:48:30,521 DEBUG
> > [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
> > Compacting
> >
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/1ff32b168134412ea3f6942126623ef6,
> > keycount=373741, bloomtype=ROW, size=143.9 M, encoding=NONE, seqNum=22889
> > 2015-11-29 00:48:30,521 DEBUG
> > [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
> > Compacting
> >
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,
> > keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
> > 2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher]
> > util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to
> a
> > long garbage collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2015-11-29 00:49:46,912 INFO  [
> SplitLogWorker-l-hbase27.dba.cn2.qunar.com
> > ,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
> > zookeeper.ClientCnxn: Client session timed out, have not heard from
> server
> > in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and
> > attempting reconnect
> > 2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker]
> > util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to
> a
> > long garbage collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2015-11-29 00:49:46,912 INFO  [regionserver60020-SendThread(
> > l-zk1.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session
> timed
> > out, have not heard from server in 74953ms for sessionid
> 0x34f16f641cc2e77,
> > closing socket connection and attempting reconnect
> > 2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> > Detected pause in JVM or host machine (eg GC): pause of approximately
> > 70587ms
> > No GCs detected
> > 2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept
> > 72658ms instead of 3000ms, this is likely due to a long garbage
> collecting
> > pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(
> > l-zk4.dba.cn8.qunar.com:2181)] zookeeper.ClientCnxn: Client session
> timed
> > out, have not heard from server in 81433ms for sessionid
> 0x54f16f4aa3363fa,
> > closing socket connection and attempting reconnect
> > 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(
> > l-zk2.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session
> timed
> > out, have not heard from server in 79798ms for sessionid
> 0x54f16f4aa3363fe,
> > closing socket connection and attempting reconnect
> > 2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block
> > BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549]
> > hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms
> > (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR
> > downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010,
> 10.90.18.60:50010,
> > 10.90.18.63:50010]
> > 2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block
> > BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
> > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> > BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
> > java.io.EOFException: Premature EOF: no length prefix available
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
> >         at
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)
> > 2015-11-29 00:49:46,914 DEBUG [LruStats #0] hfile.LruBlockCache:
> > Total=20.98 MB, free=18.38 GB, max=18.40 GB, blocks=19756849152,
> > accesses=564477317, hits=166122280, hitRatio=29.43%, ,
> > cachingAccesses=166131865, cachingHits=166015128,
> cachingHitsRatio=99.93%,
> > evictions=149453, evicted=116562, evictedPerRun=0.7799240946769714
> > 2015-11-29 00:49:46,929 WARN  [DataStreamer for file
> /hbase/tc_qtrace/WALs/
> > l-hbase27.dba.cn2.qunar.com,60020,1447234715888/
> > l-hbase27.dba.cn2.qunar.com%2C60020%2C1447234715888.1448726130837.meta
> > block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
> > hdfs.DFSClient: Error Recovery for block
> > BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 in pipeline
> > 10.90.18.47:50010, 10.90.18.61:50010, 10.90.18.53:50010: bad datanode
> > 10.90.18.47:50010
>


-- 
Sincerely.
伍 涛 | Tony Wu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message