hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: mslab enabled jvm crash
Date Wed, 25 May 2011 19:23:57 GMT
On Wed, May 25, 2011 at 11:08 AM, Wayne <wav100@gmail.com> wrote:
> I tried to turn off all special JVM settings we have tried in the past.
> Below are link to the requested configs. I will try to find more logs for
> the full GC. We just made the switch and on this node it has
> only occurred once in the scope of the current log (it may have rolled?).
>
> Thanks.
>
> http://pastebin.com/ca13aMRu
>

You are running w/ the defaults.  Would suggest you do as Todd says;
turn off incremental and can you try giving hbase more RAM (or cap
your parnew)?  From the above, if representative, it would seem that
your young gen is riding at about 256M?  Is that so?   Try capping it
at something smaller?  192M or 128M?  Another thing to try is
startting the CMS earlier.  Usually this is recommended as means of
putting off concurrent mode failures as opposed to promotion failures
but it might be enough to ensure space to do the parnew promotion to
old gen (This is an oldie but might I'd say basic premise holds:
http://blogs.oracle.com/jonthecollector/entry/when_the_sum_of_the).

How long do you run before you hit this pause?  Or is this only the
first instance?

> http://pastebin.com/9KfRZFBW

It looks like you are keeping your zk logs in /tmp, is that so?  See
hbase.zookeeper.property.dataDir.

You are flushing at 4x the default.  You might try running at defaults
to see if it lesser retention brings some relief.

St.Ack
P.S. We do not do bulk loading.


>
>
> On Wed, May 25, 2011 at 1:42 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hi Wayne,
>>
>> Looks like your RAM might be oversubscribed. Could you paste your
>> hbase-site.xml and hbase-env.sh files? Also looks like you have some
>> strange GC settings on (eg perm gen collection which we don't really
>> need)
>>
>> If you can paste a larger segment of GC logs (enough to include at
>> least two or three of the full gc pauses) that would be helpful.
>>
>> -Todd
>>
>> On Wed, May 25, 2011 at 10:32 AM, Wayne <wav100@gmail.com> wrote:
>> > We switched to u25 and reverted the JVM settings to those recommended.
>> Now
>> > we have concurrent mode failures that occur lasting more than 60 seconds
>> > while not under hardly any load....
>> >
>> > Below are the entries from the JVM log. Of course we can up the zookeeper
>> > timeout to 2 min or 10 min for that matter but it does not address the
>> > underlying issue. Sorry but I can not confirm that the changes for the
>> new
>> > GC settings have any affect. It appears no better or even worse as this
>> > problem below occurred while the cluster was almost idle.
>> >
>> >
>> > 2011-05-25T14:15:45.518+0000: 150358.023: [GC 150358.023: [ParNew:
>> > 230155K->27648K(249216K), 0.0653880 secs] 7754007K->7586719K(8360960K)
>> > icms_dc=100 , 0.0654900 secs] [Times: user=0.78 sys=0.00, real=0.06 secs]
>> > 2011-05-25T14:15:45.906+0000: 150358.410: [GC 150358.410: [ParNew
>> (promotion
>> > failed): 249216K->249216K(249216K), 0.5768350 secs]150358.987:
>> > [CMS2011-05-25T14:16:44.404+0000: 150416.909: [CMS-concurrent-sweep:
>> > 87.667/92.820 secs] [Times: user=182.64 sys=1.37, real=92.80 secs]
>> >  (concurrent mode failure)[Unloading class
>> > sun.reflect.GeneratedMethodAccessor20]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor29]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor31]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor30]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor32]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor1]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor17]
>> > [Unloading class sun.reflect.GeneratedMethodAccessor28]
>> > : 7621159K->2503625K(8111744K), 63.3195660 secs]
>> > 7798327K->2503625K(8360960K), [CMS Perm : 20128K->20106K(33580K)]
>> > icms_dc=100 , 63.8965450 secs] [Times: user=69.50 sys=0.01, real=63.89
>> > secs]
>> >
>> >
>> >
>> > On Mon, May 23, 2011 at 12:04 PM, Stack <stack@duboce.net> wrote:
>> >
>> >> On Mon, May 23, 2011 at 8:42 AM, Wayne <wav100@gmail.com> wrote:
>> >> > Our experience with any newer JVM was that fragmentation was much much
>> >> worse
>> >> > and Concurrent Mode Failures were rampant. We kept moving back in
>> >> releases
>> >> >  to get to what we use now. We are on CentOS 5.5. We will try to use
>> u24.
>> >> >
>> >>
>> >> CMS's you should be able to configure around.  u21 was supposed to
>> >> make improvements to put off frag but apparently made it worse.  Try
>> >> u25, the latest.  Also google for other's experience with JVMs up on
>> >> CentOS 5.5.
>> >>
>> >> St.Ack
>> >>
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Mime
View raw message