hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Lee (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2108) NullPointerException in JVMMetrics for OOM killed task
Date Fri, 26 Oct 2007 16:38:50 GMT
NullPointerException in JVMMetrics for OOM killed task
------------------------------------------------------

                 Key: HADOOP-2108
                 URL: https://issues.apache.org/jira/browse/HADOOP-2108
             Project: Hadoop
          Issue Type: Bug
          Components: metrics
    Affects Versions: 0.14.2
         Environment: Centos5 jdk1.6.0_02
            Reporter: Richard Lee
            Priority: Minor


I had a reduce task run out of memory and die in such a way that JVMMetrics.doThreadUpdates()
throws a NullPointerException.

The aparent cause seems to be that the call to threadMXBean.getThreadInfo() on JVMMetrics:119
returns an array of ThreadInfo whose elements may be null.

Here's a relevant quote from the javadoc:
This method returns an array of the ThreadInfo objects,
     * each is the thread information about the thread with the same index
     * as in the ids array.
     * If a thread of the given ID is not alive or does not exist,
     * null will be set in the corresponding element 
     * in the returned array.  A thread is alive if 
     * it has been started and has not yet died.

My stacktrace looks like this:
java.lang.NullPointerException
	at org.apache.hadoop.metrics.jvm.JvmMetrics.doThreadUpdates(JvmMetrics.java:129)
	at org.apache.hadoop.metrics.jvm.JvmMetrics.doUpdates(JvmMetrics.java:79)
	at org.apache.hadoop.metrics.spi.AbstractMetricsContext.timerEvent(AbstractMetricsContext.java:284)
	at org.apache.hadoop.metrics.spi.AbstractMetricsContext.access$000(AbstractMetricsContext.java:50)
	at org.apache.hadoop.metrics.spi.AbstractMetricsContext$1.run(AbstractMetricsContext.java:249)
	at java.util.TimerThread.mainLoop(Timer.java:512)
	at java.util.TimerThread.run(Timer.java:462)

On line 129,  there's an attempt to dereference the potientially null threadInfo value to
get its current state.

The naive solution here is to check for null and count null values as "terminated"... but
it seems clear that a thread state of TERMINATED and a null ThreadInfo value are distinct
cases and may need special treatment.

Guessing that this is a "minor" issue because it seems more cosmetic than mission critical.
 I'm not sure what the upstream effects are of this method throwing the NPE, so i didn't set
it to "trivial".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message