hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
Date Wed, 10 Dec 2014 02:44:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240559#comment-14240559
] 

Tsuyoshi OZAWA commented on YARN-2910:
--------------------------------------

[~wilfreds], [~rchiang], [~kasha]  Sorry for the delay and misreading assignee's history log.
I have one question about the fix:

{code}
+    writeLock.lock();
+    try {
+      Collections.sort(runnableApps, comparator);
+    } finally {
+      writeLock.unlock();
+    }
+    readLock.lock();
+    try {
+      for (FSAppAttempt sched : runnableApps) {
+        if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
+          continue;
+        }
+
+        assigned = sched.assignContainer(node);
+        if (!assigned.equals(Resources.none())) {
+          break;
+        }
       }
+    } finally {
+      readLock.unlock();
{code}

Can we really WriteLock.unlock before reading the value? The order of entries of runnableApps
can be inconsistent between sort() and iteration of runnableApps.
This can cause the breaking of the fair scheduling. I think following code flow is correct
one. What do you think?

{code}
    writeLock.lock();
    try {
      Collections.sort(runnableApps, comparator);
      for (FSAppAttempt sched : runnableApps) {
        if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
          continue;
        }

        assigned = sched.assignContainer(node);
        if (!assigned.equals(Resources.none())) {
          break;
        }
      }
    } finally {
      writeLock.unlock();
    }
{code}

> FSLeafQueue can throw ConcurrentModificationException
> -----------------------------------------------------
>
>                 Key: YARN-2910
>                 URL: https://issues.apache.org/jira/browse/YARN-2910
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>             Fix For: 2.7.0
>
>         Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.004.patch, YARN-2910.1.patch,
YARN-2910.2.patch, YARN-2910.3.patch, YARN-2910.4.patch, YARN-2910.5.patch, YARN-2910.6.patch,
YARN-2910.7.patch, YARN-2910.8.patch, YARN-2910.patch
>
>
> The list that maintains the runnable and the non runnable apps are a standard ArrayList
but there is no guarantee that it will only be manipulated by one thread in the system. This
can lead to the following exception:
> {noformat}
> 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
ERROR IN CONTACTING RM.
> java.util.ConcurrentModificationException: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
> at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
> {noformat}
> Full stack trace in the attached file.
> We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message