hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tangshangwen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4324) AM hang more than 10 min was kill by RM
Date Tue, 15 Dec 2015 09:20:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057690#comment-15057690
] 

tangshangwen commented on YARN-4324:
------------------------------------

I found the RMContainerAllocator last contact RM in AM logs ,and it Does not apply to reduce

2015-12-15 02:57:39,893 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
After Scheduling: PendingReds:732 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5773 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:8995 ContRel:3222 HostLocal:5310 RackLocal:338
 
AM received an kill signal 

2015-12-15 03:01:29,383 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1449835724839_219910_m_001345_1 TaskAttempt Transitioned from NEW to UNASSIGNED
2015-12-15 03:08:54,073 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster
received a signal. Signaling RMCommunicator and JobHistoryEventHandler.

I guess AM in 10min not send a heartbeat to RM,RM logs Rolling too fast,I will try to
get RM logs and update






> AM hang more than 10 min was kill by RM
> ---------------------------------------
>
>                 Key: YARN-4324
>                 URL: https://issues.apache.org/jira/browse/YARN-4324
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: tangshangwen
>         Attachments: logs.rar, yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:
Processing the event EventType: JOB_COMMIT                  
> 2015-11-02 01:24:15,851 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
RMCommunicator notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Notify RMCommunicator isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message