hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13393) Improve OOM logging
Date Fri, 26 Oct 2018 22:19:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665716#comment-16665716
] 

Wei-Chiu Chuang commented on HDFS-13393:
----------------------------------------

Excerpt from book ??Java Performance The definitive Guide??, Chapter 7/8:

{quote}
Out of native memory
The first case in this list – no native memory available for the JVM – occurs for reasons
unrelated to the heap at all. In a 32-bit JVM, the maximum size of a process is 4GB (3GB on
some versions of Windows, and about 3.5 GB on some older versions of Linux). Specifying a
very large heap – say, 3.8GB – brings the application size dangerously close to that limit.
Even in a 64-bit JVM, the operating system may not have sufficient virtual memory for whatever
the JVM requests.

This topic is addressed more fully in Chapter 8. Be aware that if the message for the out
of memory error discusses allocation of native memory, then heap tuning isn't the answer:
you need to look into whatever native memory issue is mentioned in the error. For example,
the following message tells you that the native memory for that thread stacks is exhausted:

Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread.
{quote}

To help diagnosis, Java 8 has Native Memory Tracking 

You'd need to add JVM option -XX:NativeMemoryTracking=summary (default is off)

And then use jcmd command to get native mem information:
$ jcmd process_id VM.native_memory summary

Additionally, another possibility is ulimit.
If user has set a lower max user process limit (ulimit -u), she could hit the exact same error,
totally unrelated to memory, but simply because she is not allowed to fork any new threads.

> Improve OOM logging
> -------------------
>
>                 Key: HDFS-13393
>                 URL: https://issues.apache.org/jira/browse/HDFS-13393
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover, datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Major
>
> It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new native thread"
errors in a HDFS cluster. Most often this happens when DataNode creating DataXceiver threads,
or when balancer creates threads for moving blocks around.
> In most of cases, the "OOM" is a symptom of number of threads reaching system limit,
rather than actually running out of memory, and the current logging of this message is usually
misleading (suggesting this is due to insufficient memory)
> How about capturing the OOM, and if it is due to "unable to create new native thread",
print some more helpful message like "bump your ulimit" or "take a jstack of the process"?
> Even better, surface this error to make it more visible. It usually takes a while for
an in-depth investigation after users notice some job fails, by the time the evidences may
already been gone (like jstack output).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message