hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tangshangwen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4530) LocalizedResource trigger a NPE Cause the NodeManager exit
Date Thu, 31 Dec 2015 09:14:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075855#comment-15075855
] 

tangshangwen commented on YARN-4530:
------------------------------------

Hi [Rohith Sharma K S | https://issues.apache.org/jira/secure/ViewProfile.jspa?name=rohithsharma],
In this patch,If assoc is null return directly, when completed.get() throw an ExecutionException,assoc
will not be null,I think this patch is not need a new test cases
{code:title=ResourceLocalizationService.java|borderStyle=solid}
            try {
              if (null == assoc) {
                LOG.error("Localized unknown resource to " + completed);
                // TODO delete
                return;
              }
              Path local = completed.get();
              LocalResourceRequest key = assoc.getResource().getRequest();
              publicRsrc.handle(new ResourceLocalizedEvent(key, local, FileUtil
                .getDU(new File(local.toUri()))));
              assoc.getResource().unlock();
            } catch (ExecutionException e) {
              LOG.info("Failed to download resource " + assoc.getResource(),
                  e.getCause());
              LocalResourceRequest req = assoc.getResource().getRequest();
              publicRsrc.handle(new ResourceFailedLocalizationEvent(req,
                  e.getMessage()));
              assoc.getResource().unlock();
            }
{code}

> LocalizedResource trigger a NPE Cause the NodeManager exit
> ----------------------------------------------------------
>
>                 Key: YARN-4530
>                 URL: https://issues.apache.org/jira/browse/YARN-4530
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 2.7.1
>            Reporter: tangshangwen
>         Attachments: YARN-4530.1.patch
>
>
> In our cluster, I found that LocalizedResource download failed trigger a NPE Cause the
NodeManager shutdown.
> {noformat}
> 2015-12-29 17:18:33,706 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://ns3:8020/user/username/projects/user_insight/lookalike/oozie/workflow/conf/hive-site.xml
transitioned from DOWNLOADING to FAILED
> 2015-12-29 17:18:33,708 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/user_insight_pig_udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar,
1451380519635, FILE, null }
> 2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Failed to download rsrc { { hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar,
1451380519452, FILE, null },pending,[(container_1451039893865_261670_01_000578)],42332661980495938,DOWNLOADING}
> java.io.IOException: Resource hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar
changed on src filesystem (expected 1451380519452, was 1451380611793
> 	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:276)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar
transitioned from DOWNLOADING to FAILED
> 2015-12-29 17:18:33,710 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Error: Shutting down
> java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
> 2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Public cache exiting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message