predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ambuj Sharma <am...@getamplify.com>
Subject Re: PIO train error on Spark/Hbase remote cluster
Date Mon, 03 Apr 2017 04:17:19 GMT
Hi,
I have used --master yarn-client and it is working great.
But before this you need to copy Haddop, Yarn and HBase configs to PIO
Machine and enable Haddop Dir Path in pio-env.sh


Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout

On Fri, Mar 31, 2017 at 11:01 PM, Malay Tripathi <malaytripathi3@gmail.com>
wrote:

> 2017-03-31 13:28:57,084 INFO  org.apache.predictionio.tools.console.Console$
> [main] - Using existing engine manifest JSON at /home/da_mcom_milan/
> PredictionIO/personalized-complementary/manifest.json
>
> 2017-03-31 13:28:58,938 INFO  org.apache.predictionio.tools.Runner$
> [main] - Submission command: /home/da_mcom_milan/
> PredictionIO/vendors/spark/bin/spark-submit --master yarn-cluster --class
> org.apache.predictionio.workflow.CreateWorkflow --jars
> file:/home/da_mcom_milan/PredictionIO/personalized-
> complementary/target/scala-2.10/template-scala-parallel-
> universal-recommendation-assembly-0.5.0-deps.jar,file:/home/da_mcom_milan/
> PredictionIO/personalized-complementary/target/scala-2.
> 10/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar
> --files file:/home/da_mcom_milan/PredictionIO/conf/log4j.
> properties,file:/home/da_mcom_milan/PredictionIO/vendors/hbase/conf/hbase-site.xml
> --driver-class-path /home/da_mcom_milan/PredictionIO/conf:/home/da_
> mcom_milan/PredictionIO/vendors/hbase/conf file:/home/da_mcom_milan/
> PredictionIO/lib/pio-assembly-0.10.0-incubating.jar --engine-id
> 7mVUx7nKCRXWPHAdk46GQOJRtH6VDnqA --engine-version
> dc0573e7ddab8588f6ae287d7386c2d6827fec86 --engine-variant
> file:/home/da_mcom_milan/PredictionIO/personalized-complementary/engine.json
> --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_
> TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_
> METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/da_mcom_milan/.
> pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=mdc2vra176,PIO_STORAGE_
> SOURCES_HBASE_HOME=/home/da_mcom_milan/PredictionIO/
> vendors/hbase,PIO_HOME=/home/da_mcom_milan/PredictionIO,
> PIO_FS_ENGINESDIR=/home/da_mcom_milan/.pio_store/engines,
> PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/da_mcom_milan/.pio_
> store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=
> elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=
> ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=
> LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=
> pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=
> pros-prod,PIO_FS_TMPDIR=/home/da_mcom_milan/.pio_store/tmp,
> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_
> STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_
> CONF_DIR=/home/da_mcom_milan/PredictionIO/conf,PIO_STORAGE_
> SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>
> 17/03/31 13:29:00 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 17/03/31 13:29:00 INFO TimelineClientImpl: Timeline service address:
> http://mdc2vra180.federated.fds:8188/ws/v1/timeline/
>
> 17/03/31 13:29:00 INFO RMProxy: Connecting to ResourceManager at
> mdc2vra180.federated.fds/11.126.100.180:8050
>
> 17/03/31 13:29:00 INFO AHSProxy: Connecting to Application History server
> at mdc2vra180.federated.fds/11.126.100.180:10200
>
> 17/03/31 13:29:01 WARN DomainSocketFactory: The short-circuit local reads
> feature cannot be used because libhadoop cannot be loaded.
>
> 17/03/31 13:29:01 INFO Client: Requesting a new application from cluster
> with 8 NodeManagers
>
> 17/03/31 13:29:01 INFO Client: Verifying our application has not requested
> more than the maximum memory capability of the cluster (47104 MB per
> container)
>
> 17/03/31 13:29:01 INFO Client: Will allocate AM container, with 1408 MB
> memory including 384 MB overhead
>
> 17/03/31 13:29:01 INFO Client: Setting up container launch context for our
> AM
>
> 17/03/31 13:29:01 INFO Client: Setting up the launch environment for our
> AM container
>
> 17/03/31 13:29:01 INFO Client: Using the spark assembly jar on HDFS
> because you are using HDP, defaultSparkAssembly:hdfs://
> mdc2vra179.federated.fds:8020/hdp/apps/2.5.3.0-37/spark/
> spark-hdp-assembly.jar
>
> 17/03/31 13:29:01 INFO Client: Preparing resources for our AM container
>
> 17/03/31 13:29:01 INFO Client: Using the spark assembly jar on HDFS
> because you are using HDP, defaultSparkAssembly:hdfs://
> mdc2vra179.federated.fds:8020/hdp/apps/2.5.3.0-37/spark/
> spark-hdp-assembly.jar
>
> 17/03/31 13:29:01 INFO Client: Source and destination file systems are the
> same. Not copying hdfs://mdc2vra179.federated.
> fds:8020/hdp/apps/2.5.3.0-37/spark/spark-hdp-assembly.jar
>
> 17/03/31 13:29:01 INFO Client: Uploading resource file:/home/da_mcom_milan/
> PredictionIO/lib/pio-assembly-0.10.0-incubating.jar ->
> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/pio-
> assembly-0.10.0-incubating.jar
>
> 17/03/31 13:29:02 INFO Client: Uploading resource file:/home/da_mcom_milan/
> PredictionIO/personalized-complementary/target/scala-2.
> 10/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar
> -> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/template-
> scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar
>
> 17/03/31 13:29:02 INFO Client: Uploading resource file:/home/da_mcom_milan/
> PredictionIO/personalized-complementary/target/scala-2.
> 10/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar ->
> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/template-
> scala-parallel-universal-recommendation_2.10-0.5.0.jar
>
> 17/03/31 13:29:02 INFO Client: Uploading resource file:/home/da_mcom_milan/
> PredictionIO/conf/log4j.properties -> hdfs://mdc2vra179.federated.
> fds:8020/user/da_mcom_milan/.sparkStaging/application_
> 1489598450058_0028/log4j.properties
>
> 17/03/31 13:29:03 INFO Client: Uploading resource file:/home/da_mcom_milan/
> PredictionIO/vendors/hbase/conf/hbase-site.xml ->
> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/hbase-site.xml
>
> 17/03/31 13:29:03 INFO Client: Uploading resource
> file:/tmp/spark-9edc270b-3291-4913-8324-5f9e3ec4810f/__spark_conf__2400158678974980853.zip
> -> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/__spark_
> conf__2400158678974980853.zip
>
> 17/03/31 13:29:03 INFO SecurityManager: Changing view acls to:
> da_mcom_milan
>
> 17/03/31 13:29:03 INFO SecurityManager: Changing modify acls to:
> da_mcom_milan
>
> 17/03/31 13:29:03 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions:
> Set(da_mcom_milan); users with modify permissions: Set(da_mcom_milan)
>
> 17/03/31 13:29:04 INFO Client: Submitting application 28 to ResourceManager
>
> 17/03/31 13:29:04 INFO YarnClientImpl: Submitted application
> application_1489598450058_0028
>
> 17/03/31 13:29:05 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:05 INFO Client:
>
> client token: N/A
>
> diagnostics: AM container is launched, waiting for AM container to
> Register with RM
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1490981344043
>
> final status: UNDEFINED
>
> tracking URL: http://mdc2vra180.federated.fds:8088/proxy/application_
> 1489598450058_0028/
>
> user: da_mcom_milan
>
> 17/03/31 13:29:06 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:07 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:08 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:09 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:10 INFO Client: Application report for
> application_1489598450058_0028 (state: ACCEPTED)
>
> 17/03/31 13:29:11 INFO Client: Application report for
> application_1489598450058_0028 (state: FAILED)
>
> 17/03/31 13:29:11 INFO Client:
>
> client token: N/A
>
> diagnostics: Application application_1489598450058_0028 failed 2 times due
> to AM Container for appattempt_1489598450058_0028_000002 exited with
> exitCode: -1000
>
> For more detailed output, check the application tracking page:
> http://mdc2vra180.federated.fds:8088/cluster/app/
> application_1489598450058_0028 Then click on links to logs of each
> attempt.
>
> Diagnostics: File does not exist: hdfs://mdc2vra179.federated.
> fds:8020/user/da_mcom_milan/.sparkStaging/application_
> 1489598450058_0028/template-scala-parallel-universal-
> recommendation-assembly-0.5.0-deps.jar
>
> java.io.FileNotFoundException: File does not exist:
> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.
> sparkStaging/application_1489598450058_0028/template-
> scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar
>
> at org.apache.hadoop.hdfs.DistributedFileSystem$25.
> doCall(DistributedFileSystem.java:1427)
>
> at org.apache.hadoop.hdfs.DistributedFileSystem$25.
> doCall(DistributedFileSystem.java:1419)
>
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
> FileSystemLinkResolver.java:81)
>
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
> DistributedFileSystem.java:1419)
>
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
>
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1724)
>
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
>
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> Failing this attempt. Failing the application.
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1490981344043
>
> final status: FAILED
>
> tracking URL: http://mdc2vra180.federated.fds:8088/cluster/app/
> application_1489598450058_0028
>
> user: da_mcom_milan
>
> Exception in thread "main" org.apache.spark.SparkException: Application
> application_1489598450058_0028 finished with failed status
>
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
>
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169)
>
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> 17/03/31 13:29:11 INFO ShutdownHookManager: Shutdown hook called
>
> 17/03/31 13:29:11 INFO ShutdownHookManager: Deleting directory
> /tmp/spark-9edc270b-3291-4913-8324-5f9e3ec4810f
>
> On Fri, Mar 31, 2017 at 9:22 AM, Donald Szeto <donald@apache.org> wrote:
>
>> Can you show the relevant parts from pio.log, please? If you don't care
>> about existing log messages, the easiest way would be to delete pio.log
>> from where you run the pio command and start fresh.
>>
>> On Fri, Mar 31, 2017 at 8:46 AM, Malay Tripathi <malaytripathi3@gmail.com
>> > wrote:
>>
>>> I think it's Yarn based, setup through Ambari.
>>>
>>>
>>> On Mar 31, 2017, at 6:29 AM, Donald Szeto <donald@apache.org> wrote:
>>>
>>> Hi Malay,
>>>
>>> Is your Spark cluster a standalone deployment or based on YARN?
>>>
>>> Regards,
>>> Donald
>>>
>>> On Thu, Mar 30, 2017 at 11:48 PM Malay Tripathi <
>>> malaytripathi3@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am running pio train *on an edge node* of distributed 8 node spark
>>>> cluster & 3 node Hbase.
>>>> When I run "pio train" the job runs but it runs on local spark & not
>>>> submitted to cluster.
>>>> If I do "pio train *--master spark://localhost:7077" *or "pio train *--master
>>>> yarn-cluster" *I get below error -
>>>>
>>>> * File does not exist:
>>>> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar*
>>>>
>>>> *java.io.FileNotFoundException: File does not exist:
>>>> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar*
>>>>
>>>> *at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)*
>>>>
>>>> *at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)*
>>>>
>>>>
>>>> mdc2vra179 is my Hbase cluster node, also running Namenode. Not sure
>>>> why my spark expecting a jar file on hbase/Namenode.
>>>> *$PIO_HOME/conf/pio-env.sh-*
>>>>
>>>> SPARK_HOME=$PIO_HOME/vendors/spark
>>>>
>>>> HBASE_CONF_DIR=$PIO_HOME/vendors/hbase/conf
>>>>
>>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>>
>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>>
>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>>
>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>>
>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>>
>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>>
>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>>
>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>>
>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>>
>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>>
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pros-prod
>>>>
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=mdc2vra176
>>>>
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>>
>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>>
>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>>
>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Malay
>>>>
>>>
>>
>

Mime
View raw message