hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Bieniosek (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1186) deadlock in Abstract Metrics Context
Date Fri, 30 Mar 2007 18:33:25 GMT
deadlock in Abstract Metrics Context
------------------------------------

                 Key: HADOOP-1186
                 URL: https://issues.apache.org/jira/browse/HADOOP-1186
             Project: Hadoop
          Issue Type: Bug
          Components: metrics
    Affects Versions: 0.12.1
         Environment: using ganglia metrics
            Reporter: Michael Bieniosek
            Priority: Critical


There appears to be a lock-inversion deadlock in AbstractMetricsContext.

When using ganglia metrics, sometimes the jobtracker will start timing out requests.  The
logs then reveal:

2007-03-30 13:59:50,942 WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding
oldest call heartbeat(org.apache.hadoop.mapred.Task
TrackerStatus@1c19919, false, true, 407) from 10.255.62.129:50215

A kill -QUIT dump shows:

"IPC Server handler 6 on 10001" daemon prio=1 tid=0x08515c08 nid=0x526a waiting for monitor
entry [0x4e6f4000..0x4e6f4f40]
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.createRecord(AbstractMetricsContext.java:192)
        - waiting to lock <0x5a562c98> (a org.apache.hadoop.metrics.ganglia.GangliaContext)
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:130)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1384)
        - locked <0x5a446330> (a org.apache.hadoop.mapred.JobTracker)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
...
"Timer-0" prio=1 tid=0x08664040 nid=0x5274 waiting for monitor entry [0x4e36d000..0x4e36df40]
        at org.apache.hadoop.mapred.JobTracker.getRunningJobs(JobTracker.java:944)
        - waiting to lock <0x5a446330> (a org.apache.hadoop.mapred.JobTracker)
        at org.apache.hadoop.mapred.JobTracker$JobTrackerMetrics.doUpdates(JobTracker.java:429)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.timerEvent(AbstractMetricsContext.java:275)
        - locked <0x5a562c98> (a org.apache.hadoop.metrics.ganglia.GangliaContext)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.access$000(AbstractMetricsContext.java:48)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext$1.run(AbstractMetricsContext.java:242)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message