hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhihai Xu <zhihaixu2...@gmail.com>
Subject Re: NodeManagers Localization does not work
Date Wed, 13 Jan 2016 06:27:04 GMT
Hi Prabhu,

Thanks for the clarification. It looks like this is a configuration issue.
Why do you configure "yarn.nodemanager.local-dirs" as /tmp/nm-local-dir?

thanks
zhihai


On Tue, Jan 12, 2016 at 8:55 PM, Prabhu Joseph <prabhujose.gates@gmail.com>
wrote:

> Thanks Zhihai for your comment.
>
> The actual issue is a container failed during localization because of
> /tmp/nm-local-dir removed by tmpwatch and hence the subsequent containers
> of that job running in that Node are hanging at LOCALIZING state. In
> hadoop-2.7.0, there was a fix made by removing the unnecessary files
> created by the failed container and hence the subsequent containers are
> working fine. Want to find the YARN JIRA which fixed this. There are many
> related YARN JIRA's for Localization but could not able to find the exact
> one.
>
> Thanks,
> Prabhu Josepj
>
> On Tue, Jan 12, 2016 at 10:01 PM, Zhihai Xu <zhihaixu2012@gmail.com>
> wrote:
>
> > Hi Prabhu,
> >
> > I saw some similar localization timeout issue. I found the localization
> > timeout issue is due to HDFS not YARN.
> > In my case, HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005>
> > fixed
> > the issue. HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005>
> is
> > only in 2.6 or later release.
> > The root cause is all public localizer threads stuck on reading file data
> > from HDFS.
> > Maybe you can try HDFS-7005 to see whether it can fix your issue.
> >
> > Regards
> > zhihai
> >
> > On Tue, Jan 12, 2016 at 2:41 AM, Prabhu Joseph <
> prabhujose.gates@gmail.com
> > >
> > wrote:
> >
> > > Hi Experts,
> > >
> > >    On hadoop-2.5.1, When Localization is failed for a container of a
> job
> > in
> > > a NodeManager at
> > >
> > >
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer,
> > > then the subsequent containers of that job submitted into that
> > NodeManager
> > > hangs at Localizing state until the task times out.
> > >
> > > On hadoop-2.7.0, the above behavior is fixed, by creating another
> > Localizer
> > > for the job in the NodeManager when the previous container fails at
> > > Localization.
> > >
> > > Can someone share me the YARN JIRA which fixed the above issue in
> > > hadoop-2.7.0.
> > >
> > >
> > > Thanks,
> > > Prabhu Joseph
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message