hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
Date Tue, 09 Dec 2014 04:05:13 GMT

     [ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wilfred Spiegelenburg updated YARN-2910:
----------------------------------------
    Attachment: YARN-2910.5.patch

OK, a complete new approach. The other approaches did not work or did not fix it so back to
a simple lock and unlock around the read and write actions.

The locking is setup with a fair distribution which is almost a fifo setup. This is not the
default option and chosen to make sure we do not cause a thread to be starved from the lock.
Multiple reads are allowed at the same time and only one writer with no readers at the same
time.

All junit tests pass in my local environment also other failures. 
As an extra change the {{synchronized}} has been removed from FSAppAttempt#getHeadRoom as
discussed with [~kasha].

> FSLeafQueue can throw ConcurrentModificationException
> -----------------------------------------------------
>
>                 Key: YARN-2910
>                 URL: https://issues.apache.org/jira/browse/YARN-2910
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>         Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.004.patch, YARN-2910.1.patch,
YARN-2910.2.patch, YARN-2910.3.patch, YARN-2910.4.patch, YARN-2910.5.patch, YARN-2910.patch
>
>
> The list that maintains the runnable and the non runnable apps are a standard ArrayList
but there is no guarantee that it will only be manipulated by one thread in the system. This
can lead to the following exception:
> {noformat}
> 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
ERROR IN CONTACTING RM.
> java.util.ConcurrentModificationException: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
> at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
> {noformat}
> Full stack trace in the attached file.
> We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message