hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prabhu Joseph <prabhujose.ga...@gmail.com>
Subject Re: NodeManagers Localization does not work
Date Wed, 13 Jan 2016 04:55:58 GMT
Thanks Zhihai for your comment.

The actual issue is a container failed during localization because of
/tmp/nm-local-dir removed by tmpwatch and hence the subsequent containers
of that job running in that Node are hanging at LOCALIZING state. In
hadoop-2.7.0, there was a fix made by removing the unnecessary files
created by the failed container and hence the subsequent containers are
working fine. Want to find the YARN JIRA which fixed this. There are many
related YARN JIRA's for Localization but could not able to find the exact
one.

Thanks,
Prabhu Josepj

On Tue, Jan 12, 2016 at 10:01 PM, Zhihai Xu <zhihaixu2012@gmail.com> wrote:

> Hi Prabhu,
>
> I saw some similar localization timeout issue. I found the localization
> timeout issue is due to HDFS not YARN.
> In my case, HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005>
> fixed
> the issue. HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005> is
> only in 2.6 or later release.
> The root cause is all public localizer threads stuck on reading file data
> from HDFS.
> Maybe you can try HDFS-7005 to see whether it can fix your issue.
>
> Regards
> zhihai
>
> On Tue, Jan 12, 2016 at 2:41 AM, Prabhu Joseph <prabhujose.gates@gmail.com
> >
> wrote:
>
> > Hi Experts,
> >
> >    On hadoop-2.5.1, When Localization is failed for a container of a job
> in
> > a NodeManager at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer,
> > then the subsequent containers of that job submitted into that
> NodeManager
> > hangs at Localizing state until the task times out.
> >
> > On hadoop-2.7.0, the above behavior is fixed, by creating another
> Localizer
> > for the job in the NodeManager when the previous container fails at
> > Localization.
> >
> > Can someone share me the YARN JIRA which fixed the above issue in
> > hadoop-2.7.0.
> >
> >
> > Thanks,
> > Prabhu Joseph
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message