hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gokulakannan M (Engineering - Data Platform)" <gokulakanna...@flipkart.com>
Subject Re: Namenode shutdown due to long GC Pauses
Date Thu, 25 Feb 2016 10:51:14 GMT
Hi Jitendra,

Trying to find the pattern but one thing observed is that the metrics
*RpcDetailedActivity.GetServerDefaultsNumOps
*is pretty high(around 14 million) when long pause happened.

G1 garbage collector is used already. These are the main JVM parameters.

-XX:+UseG1GC
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
-XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
-XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
-server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-Xms75776m -Xmx75776m

On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <oraclehad@gmail.com> wrote:

> Which garbage collector you are using currently in your env? Can you share
> the jvm parameters?.  If you are using CMS and already optimized your
> parameter then probably you can look at to G1 garbage collector.
>
> First you should look at the GC stats and pattern to find out the cause of
> long GC.
>
> Regards
> Jitendra
>
>
>
> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nhsandeep6@gmail.com>
> wrote:
>
>> You my need to tune your GC settings.
>>
>>
>> ᐧ
>>
>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <lloydsensei@gmail.com>
>> wrote:
>>
>>> This happened to us. Our namenodes are on a virtual machine, and
>>> reducing the number of replication locations of the journal node to
>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>
>>> Regards,
>>> LLoyd
>>>
>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>> Platform) <gokulakannan.m@flipkart.com> wrote:
>>> > Hi,
>>> >
>>> > It is known that namenode shuts down when a long GC pause happens when
>>> NN
>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>> didn't
>>> > respond but actually it was due to the long GC pause. Any pointers on
>>> > solving this issue?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>
>>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


--

Mime
View raw message