kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: File not found error at step 2 in yarn logs
Date Thu, 29 Jun 2017 06:32:27 GMT
Kylin sends metadata as distributed cache of MR job. The missing file
"file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta"
should be prepared on machine B and D before YARN kicks off mappers.

As to why the files were not there.... I don't know.

On Wed, Jun 14, 2017 at 12:12 PM, Gavin_Chou <zhou.guo.qiao@163.com> wrote:

> Hi, all:
> I have a problem while building cube at step 2.
>
> The error appears in yarn log:
>
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Application
> application_1497364689294_0018 transitioned from NEW to INITING
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Adding
> container_1497364689294_0018_01_000001 to application application_
> 1497364689294_0018
> 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Application
> application_1497364689294_0018 transitioned from INITING to RUNNING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned from NEW to LOCALIZING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for
> appId application_1497364689294_0018
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.jar
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.splitmetainfo
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.split
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource: Resource
> file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.xml
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource:
> Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta
> transitioned from INIT to DOWNLOADING
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1497364689294_0018_01_000001
> 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Downloading public rsrc:{ file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta,
> 1497410467000, FILE, null }
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file /home/q/hadoop/hadoop/
> tmp/nm-local-dir/nmPrivate/container_1497364689294_0018_01_000001.tokens.
> Credentials list:
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Failed to download rsrc { { file:/home/q/hadoop/kylin/
> tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE,
> null },pending,[(container_1497364689294_0018_01_000001)]
> ,781495827608056,DOWNLOADING}
> java.io.FileNotFoundException: File file:/home/q/hadoop/kylin/
> tomcat/temp/kylin_job_meta3892468167792432608/meta does not exist
> at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(
> RawLocalFileSystem.java:524)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(
> RawLocalFileSystem.java:737)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
> RawLocalFileSystem.java:514)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
> FilterFileSystem.java:397)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:250)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user hadoop
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalizedResource:
> Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_
> meta3892468167792432608/meta(->/home/q/hadoop/hadoop/tmp/nm-local-dir/filecache/18/meta)
> transitioned from DOWNLOADING to FAILED
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned from LOCALIZING to
> LOCALIZATION_FAILED
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
> Container container_1497364689294_0018_01_000001 sent RELEASE event on a
> resource request { file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta,
> 1497410467000, FILE, null } not present in cache.
> 2017-06-14 11:21:08,797 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl
> RESULT=FAILURE DESCRIPTION=Container failed with state:
> LOCALIZATION_FAILED APPID=application_1497364689294_0018
> CONTAINERID=container_1497364689294_0018_01_000001
> 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0018_01_000001 transitioned
> from LOCALIZATION_FAILED to DONE
>
> This error appears in yarn-nodemanager log of machine B and D. And before
> it I found a warning log in yarn-nodemanager log in machine C (Kylin is
> only installed in machine A):
>
> 2017-06-14 11:21:01,131 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from LOCALIZING to
> LOCALIZED
> 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from LOCALIZED to
> RUNNING
> 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither
> virutal-memory nor physical-memory monitoring is needed. Not running the
> monitor-thread
> 2017-06-14 11:21:01,149 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> launchContainer: [nice, -n, 0, bash, /home/q/hadoop/hadoop/tmp/nm-
> local-dir/usercache/hadoop/appcache/application_
> 1497364689294_0017/container_1497364689294_0017_01_000002/
> default_container_executor.sh]
> 2017-06-14 11:21:05,024 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.ContainerManagerImpl: Stopping container
> with container Id: container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop IP=10.90.181.160 OPERATION=Stop Container Request
> TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_
> 1497364689294_0017 CONTAINERID=container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from RUNNING to
> KILLING
> 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up
> container container_1497364689294_0017_01_000002
> 2017-06-14 11:21:05,028 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Exit code from container container_1497364689294_0017_01_000002 is : 143
> 2017-06-14 11:21:05,040 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned from KILLING to
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl
> RESULT=SUCCESS APPID=application_1497364689294_0017 CONTAINERID=container_
> 1497364689294_0017_01_000002
> 2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.Container: Container
> container_1497364689294_0017_01_000002 transitioned
> from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
>
> It puzzles me that why kylin wants to load a local file by applications on
> other nodes in step 2? How can I solve it?
>
> Here are some additional information(They may be helpful for analyzing the
> problem):
> The cluster has 4 machines: A B C and D.
> Hadoop version 2.5.0  support snappy
>       Namenode: A(stand by) B(active)
>       Datanode: all
> Hive version 0.13.1 recompile for hadoop2
> HBase version 0.98.6 recompile for hadoop 2.5.0
>      Master: A(active) and B
> When I set “hbase.rootdir” in hbase-site.xml as detail IP address of
> active namenode, the step 2 is ok, but it will failed at the last 5 step.
> So I change the setting item to cluster name. And there is no problem in
> hbase logs.
>
> Thank you
>
> Best regards
>
>
>
>
>

Mime
View raw message