hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: gc pause killing regionserver
Date Thu, 08 Mar 2012 22:29:09 GMT
When real cpu is bigger than user cpu it very often points to
swapping. Even if you think you turned that off or that there's no
possible way you could be swapping, check it again.

I could also be that your CPUs were busy doing something else, I've
seen crazy context switching CPUs freezing up my nodes, but in my
experience it's not very likely.

Setting swappiness to 0 just means it's not going to page anything out
until it really needs to do it, meaning it's possible to swap. The
only way to guarantee no swapping whatsoever is giving your system 0
swap space.

Regarding that promotion failure, you could try reducing the eden
size. Try -Xmn128m

J-D

On Sat, Mar 3, 2012 at 5:05 AM, Ferdy Galema <ferdy.galema@kalooga.com> wrote:
> Hi,
>
> I'm running regionservers with 2GB heap and following tuning options:
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:MaxGCPauseMillis=100
>
> A regionserver aborted (YouAreDeadException) and this was printed in the gc
> logs (all is shown up until the abort)
>
> 211663.516: [GC 211663.516: [ParNew: 118715K->13184K(118912K), 0.0445390
> secs] 1373940K->1289814K(2233472K), 0.0446420 secs] [Times: user=0.14
> sys=0.01, real=0.05 secs]
> 211663.686: [GC 211663.686: [ParNew: 118912K->13184K(118912K), 0.0594280
> secs] 1395542K->1310185K(2233472K), 0.0595420 secs] [Times: user=0.15
> sys=0.00, real=0.06 secs]
> 211663.869: [GC 211663.869: [ParNew: 118790K->13184K(118912K), 0.0434820
> secs] 1415792K->1331317K(2233472K), 0.0435930 secs] [Times: user=0.13
> sys=0.01, real=0.04 secs]
> 211667.598: [GC 211667.598: [ParNew (promotion failed):
> 118912K->118912K(118912K), 0.0225390 secs]211667.621: [CMS:
> 1330845K->1127914K(2114560K), 51.3610670 secs]
> 1437045K->1127914K(2233472K), [CMS Perm : 20680K->20622K(34504K)],
> 51.3838170 secs] [Times: user=1.82 sys=0.31, real=51.38 secs]
> 211719.713: [GC 211719.714: [ParNew: 105723K->13184K(118912K), 0.0176130
> secs] 1233638K->1149393K(2233472K), 0.0177230 secs] [Times: user=0.07
> sys=0.00, real=0.02 secs]
> 211719.851: [GC 211719.852: [ParNew: 118912K->13184K(118912K), 0.0281860
> secs] 1255121K->1170269K(2233472K), 0.0282970 secs] [Times: user=0.10
> sys=0.01, real=0.03 secs]
> 211719.993: [GC 211719.993: [ParNew: 118795K->13184K(118912K), 0.0276320
> secs] 1275880K->1191268K(2233472K), 0.0277350 secs] [Times: user=0.09
> sys=0.00, real=0.03 secs]
> 211720.490: [GC 211720.490: [ParNew: 118912K->13184K(118912K), 0.0624650
> secs] 1296996K->1210640K(2233472K), 0.0625560 secs] [Times: user=0.15
> sys=0.00, real=0.06 secs]
> 211720.687: [GC 211720.687: [ParNew: 118702K->13184K(118912K), 0.1651750
> secs] 1316159K->1231993K(2233472K), 0.1652660 secs] [Times: user=0.25
> sys=0.01, real=0.17 secs]
> 211721.038: [GC 211721.038: [ParNew: 118912K->13184K(118912K), 0.0952750
> secs] 1337721K->1252598K(2233472K), 0.0953660 secs] [Times: user=0.15
> sys=0.00, real=0.09 secs]
> Heap
>  par new generation  total 118912K, used 86199K [0x00002aaaae1f0000,
> 0x00002aaab62f0000, 0x00002aaab62f0000)
>  eden space 105728K,  69% used [0x00002aaaae1f0000, 0x00002aaab293dfa8,
> 0x00002aaab4930000)
>  from space 13184K, 100% used [0x00002aaab4930000, 0x00002aaab5610000,
> 0x00002aaab5610000)
>  to  space 13184K,  0% used [0x00002aaab5610000, 0x00002aaab5610000,
> 0x00002aaab62f0000)
>  concurrent mark-sweep generation total 2114560K, used 1239414K
> [0x00002aaab62f0000, 0x00002aab373f0000, 0x00002aab373f0000)
>  concurrent-mark-sweep perm gen total 34504K, used 20728K
> [0x00002aab373f0000, 0x00002aab395a2000, 0x00002aab3c7f0000)
>
>
> Why did a GC took 51 seconds? The machine still had enough memory available
> so it could not be swapping. (swapiness is set to 0). From the 15
> regionservers in total, I often see this specific regionserver fail. What
> do you recommended in this situation?
>
> Ferdy.

Mime
View raw message