hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2954) Deadlock in NM with threads racing for ApplicationAttemptId
Date Fri, 09 Sep 2011 06:49:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101035#comment-13101035
] 

Siddharth Seth commented on MAPREDUCE-2954:
-------------------------------------------

Looks ok - but am not sure about the large prime - will almost definitely cause the hashcode
to wrap around the integer range which is likely not a problem. We could revert to the eclipse
generated default of 31.

bq. We should be able to do better if we analyse more on our IDs, but this should work for
now.
Completely agree with this though - clusterTimestamp is in ms, there's unlikely to be a very
large number of attemptIds and container per app.

> Deadlock in NM with threads racing for ApplicationAttemptId
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2954
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2954
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>            Priority: Critical
>             Fix For: 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2954-20110909.txt, MR2954_1.patch
>
>
> Found this:
> {code}
> Java stack information for the threads listed above:
> ===================================================
> "Thread-45":
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
>         - waiting to lock <0xb6a43ba0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
>         - locked <0xb6a443a0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
>         at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:797)
>         at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1640)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:360)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:355)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:113)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:619)
> "Thread-30":
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
>         - waiting to lock <0xb6a443a0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
>         - locked <0xb6a43ba0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
>         at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
>         at java.util.concurrent.ConcurrentSkipListMap.doRemove(ConcurrentSkipListMap.java:1078)
>         at java.util.concurrent.ConcurrentSkipListMap.remove(ConcurrentSkipListMap.java:1673)
>         at java.util.concurrent.ConcurrentSkipListMap$Iter.remove(ConcurrentSkipListMap.java:2256)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:223)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:62)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:262)
> Found 1 deadlock.
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message