hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
Date Tue, 13 Aug 2013 23:24:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739011#comment-13739011
] 

Zhijie Shen commented on YARN-292:
----------------------------------

Did more investigation on this issue:

{code}
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
{code}
This log indicates that ArrayIndexOutOfBoundsException happens because the application is
not found. There're three possibilities where the application is not found:

1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler
will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED
event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition
to ALLOCATED_SAVING consequently. Therefore, this case is impossible.

2. The application has already been removed from FiFoScheduler#applications. To trigger the
removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition.


It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread
of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher.
The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent
and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents
or two SchedulerEvents will not.

Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already
finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition,
it will enter an final state as well, such that AMContainerAllocatedTransition will not happen
at all. In conclusion, this case is impossible as well.

3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it.
First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications
is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods
of accessing the map are not consistently synchronized, thus, read and write on the same map
can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually
call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler
on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore,
getting null when the application actually exists happens under a big number of concurrent
operations.

Please feel free to correct me if you think there's something wrong or missing with the analysis.
I'm going to work on a patch to fix the problem.
                
> ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED
for application attempt
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-292
>                 URL: https://issues.apache.org/jira/browse/YARN-292
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.0.1-alpha
>            Reporter: Devaraj K
>            Assignee: Zhijie Shen
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
> 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
> 	at java.util.Arrays$ArrayList.get(Arrays.java:3381)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> 	at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message