hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-2095) Large MapReduce Job stops responding
Date Fri, 23 May 2014 23:14:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007779#comment-14007779
] 

Vinod Kumar Vavilapalli edited comment on YARN-2095 at 5/23/14 11:13 PM:
-------------------------------------------------------------------------

Vinod, could you read the email below. Would you agree that there should be a log entry from
Yarn in this case?

Clay,

What I noticed is that your reducers were overloaded and were on the brink of running out
of memory. The Java heaps were running at 99% and continuously GC’ing while the app was
reading from disk. So it was trying it’s best to process the job with limited resources.
I agree with you that it would be helpful if the container could put out a log message that
there was GC issues to help with debugging.

Thanks,


was (Author: stuart.mcdonald@bateswhite.com):
Vinod, could you read Eric's email below. Would you agree that there should be a log entry
from Yarn in this case?

Clay McDonald 
Cell: 202.560.4101 
Direct: 202.747.5962 


From: Eric Mizell [mailto:emizell@hortonworks.com] 
Sent: Friday, May 23, 2014 4:18 PM
To: Clay McDonald
Subject: Re: [jira] [Created] (YARN-2095) Large MapReduce Job stops responding

Clay,

What I noticed is that your reducers were overloaded and were on the brink of running out
of memory. The Java heaps were running at 99% and continuously GC’ing while the app was
reading from disk. So it was trying it’s best to process the job with limited resources.
I agree with you that it would be helpful if the container could put out a log message that
there was GC issues to help with debugging.

Thanks,

Eric Mizell  Director Solution Engineering, Hortonworks
Mobile: 678-761-7623
Email: emizell@hortonworks.com
Website: http://www.hortonworks.com/





> Large MapReduce Job stops responding
> ------------------------------------
>
>                 Key: YARN-2095
>                 URL: https://issues.apache.org/jira/browse/YARN-2095
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
>            Reporter: Clay McDonald
>            Priority: Blocker
>
> Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but logging to container
logs stop after running 33 hours. The job appears to be hung. The status of the job is "RUNNING".
No error messages found in logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message