predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malay Tripathi <malaytripat...@gmail.com>
Subject Re: PIO train error on Spark/Hbase remote cluster
Date Fri, 31 Mar 2017 17:31:25 GMT
2017-03-31 13:28:57,084 INFO
org.apache.predictionio.tools.console.Console$ [main] - Using existing
engine manifest JSON at
/home/da_mcom_milan/PredictionIO/personalized-complementary/manifest.json

2017-03-31 13:28:58,938 INFO  org.apache.predictionio.tools.Runner$ [main]
- Submission command:
/home/da_mcom_milan/PredictionIO/vendors/spark/bin/spark-submit --master
yarn-cluster --class org.apache.predictionio.workflow.CreateWorkflow --jars
file:/home/da_mcom_milan/PredictionIO/personalized-complementary/target/scala-2.10/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar,file:/home/da_mcom_milan/PredictionIO/personalized-complementary/target/scala-2.10/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar
--files
file:/home/da_mcom_milan/PredictionIO/conf/log4j.properties,file:/home/da_mcom_milan/PredictionIO/vendors/hbase/conf/hbase-site.xml
--driver-class-path
/home/da_mcom_milan/PredictionIO/conf:/home/da_mcom_milan/PredictionIO/vendors/hbase/conf
file:/home/da_mcom_milan/PredictionIO/lib/pio-assembly-0.10.0-incubating.jar
--engine-id 7mVUx7nKCRXWPHAdk46GQOJRtH6VDnqA --engine-version
dc0573e7ddab8588f6ae287d7386c2d6827fec86 --engine-variant
file:/home/da_mcom_milan/PredictionIO/personalized-complementary/engine.json
--verbosity 0 --json-extractor Both --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/da_mcom_milan/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=mdc2vra176,PIO_STORAGE_SOURCES_HBASE_HOME=/home/da_mcom_milan/PredictionIO/vendors/hbase,PIO_HOME=/home/da_mcom_milan/PredictionIO,PIO_FS_ENGINESDIR=/home/da_mcom_milan/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/da_mcom_milan/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pros-prod,PIO_FS_TMPDIR=/home/da_mcom_milan/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/home/da_mcom_milan/PredictionIO/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

17/03/31 13:29:00 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

17/03/31 13:29:00 INFO TimelineClientImpl: Timeline service address:
http://mdc2vra180.federated.fds:8188/ws/v1/timeline/

17/03/31 13:29:00 INFO RMProxy: Connecting to ResourceManager at
mdc2vra180.federated.fds/11.126.100.180:8050

17/03/31 13:29:00 INFO AHSProxy: Connecting to Application History server
at mdc2vra180.federated.fds/11.126.100.180:10200

17/03/31 13:29:01 WARN DomainSocketFactory: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.

17/03/31 13:29:01 INFO Client: Requesting a new application from cluster
with 8 NodeManagers

17/03/31 13:29:01 INFO Client: Verifying our application has not requested
more than the maximum memory capability of the cluster (47104 MB per
container)

17/03/31 13:29:01 INFO Client: Will allocate AM container, with 1408 MB
memory including 384 MB overhead

17/03/31 13:29:01 INFO Client: Setting up container launch context for our
AM

17/03/31 13:29:01 INFO Client: Setting up the launch environment for our AM
container

17/03/31 13:29:01 INFO Client: Using the spark assembly jar on HDFS because
you are using HDP,
defaultSparkAssembly:hdfs://mdc2vra179.federated.fds:8020/hdp/apps/2.5.3.0-37/spark/spark-hdp-assembly.jar

17/03/31 13:29:01 INFO Client: Preparing resources for our AM container

17/03/31 13:29:01 INFO Client: Using the spark assembly jar on HDFS because
you are using HDP,
defaultSparkAssembly:hdfs://mdc2vra179.federated.fds:8020/hdp/apps/2.5.3.0-37/spark/spark-hdp-assembly.jar

17/03/31 13:29:01 INFO Client: Source and destination file systems are the
same. Not copying
hdfs://mdc2vra179.federated.fds:8020/hdp/apps/2.5.3.0-37/spark/spark-hdp-assembly.jar

17/03/31 13:29:01 INFO Client: Uploading resource
file:/home/da_mcom_milan/PredictionIO/lib/pio-assembly-0.10.0-incubating.jar
->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/pio-assembly-0.10.0-incubating.jar

17/03/31 13:29:02 INFO Client: Uploading resource
file:/home/da_mcom_milan/PredictionIO/personalized-complementary/target/scala-2.10/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar
->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar

17/03/31 13:29:02 INFO Client: Uploading resource
file:/home/da_mcom_milan/PredictionIO/personalized-complementary/target/scala-2.10/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar
->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar

17/03/31 13:29:02 INFO Client: Uploading resource
file:/home/da_mcom_milan/PredictionIO/conf/log4j.properties ->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/log4j.properties

17/03/31 13:29:03 INFO Client: Uploading resource
file:/home/da_mcom_milan/PredictionIO/vendors/hbase/conf/hbase-site.xml ->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/hbase-site.xml

17/03/31 13:29:03 INFO Client: Uploading resource
file:/tmp/spark-9edc270b-3291-4913-8324-5f9e3ec4810f/__spark_conf__2400158678974980853.zip
->
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/__spark_conf__2400158678974980853.zip

17/03/31 13:29:03 INFO SecurityManager: Changing view acls to: da_mcom_milan

17/03/31 13:29:03 INFO SecurityManager: Changing modify acls to:
da_mcom_milan

17/03/31 13:29:03 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions:
Set(da_mcom_milan); users with modify permissions: Set(da_mcom_milan)

17/03/31 13:29:04 INFO Client: Submitting application 28 to ResourceManager

17/03/31 13:29:04 INFO YarnClientImpl: Submitted application
application_1489598450058_0028

17/03/31 13:29:05 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:05 INFO Client:

client token: N/A

diagnostics: AM container is launched, waiting for AM container to Register
with RM

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1490981344043

final status: UNDEFINED

tracking URL:
http://mdc2vra180.federated.fds:8088/proxy/application_1489598450058_0028/

user: da_mcom_milan

17/03/31 13:29:06 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:07 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:08 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:09 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:10 INFO Client: Application report for
application_1489598450058_0028 (state: ACCEPTED)

17/03/31 13:29:11 INFO Client: Application report for
application_1489598450058_0028 (state: FAILED)

17/03/31 13:29:11 INFO Client:

client token: N/A

diagnostics: Application application_1489598450058_0028 failed 2 times due
to AM Container for appattempt_1489598450058_0028_000002 exited with
exitCode: -1000

For more detailed output, check the application tracking page:
http://mdc2vra180.federated.fds:8088/cluster/app/application_1489598450058_0028
Then click on links to logs of each attempt.

Diagnostics: File does not exist:
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar

java.io.FileNotFoundException: File does not exist:
hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0028/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar

at
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)

at
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)

at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419)

at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)

at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Failing this attempt. Failing the application.

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1490981344043

final status: FAILED

tracking URL:
http://mdc2vra180.federated.fds:8088/cluster/app/application_1489598450058_0028

user: da_mcom_milan

Exception in thread "main" org.apache.spark.SparkException: Application
application_1489598450058_0028 finished with failed status

at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

17/03/31 13:29:11 INFO ShutdownHookManager: Shutdown hook called

17/03/31 13:29:11 INFO ShutdownHookManager: Deleting directory
/tmp/spark-9edc270b-3291-4913-8324-5f9e3ec4810f

On Fri, Mar 31, 2017 at 9:22 AM, Donald Szeto <donald@apache.org> wrote:

> Can you show the relevant parts from pio.log, please? If you don't care
> about existing log messages, the easiest way would be to delete pio.log
> from where you run the pio command and start fresh.
>
> On Fri, Mar 31, 2017 at 8:46 AM, Malay Tripathi <malaytripathi3@gmail.com>
> wrote:
>
>> I think it's Yarn based, setup through Ambari.
>>
>>
>> On Mar 31, 2017, at 6:29 AM, Donald Szeto <donald@apache.org> wrote:
>>
>> Hi Malay,
>>
>> Is your Spark cluster a standalone deployment or based on YARN?
>>
>> Regards,
>> Donald
>>
>> On Thu, Mar 30, 2017 at 11:48 PM Malay Tripathi <malaytripathi3@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am running pio train *on an edge node* of distributed 8 node spark
>>> cluster & 3 node Hbase.
>>> When I run "pio train" the job runs but it runs on local spark & not
>>> submitted to cluster.
>>> If I do "pio train *--master spark://localhost:7077" *or "pio train *--master
>>> yarn-cluster" *I get below error -
>>>
>>> * File does not exist:
>>> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar*
>>>
>>> *java.io.FileNotFoundException: File does not exist:
>>> hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar*
>>>
>>> *at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)*
>>>
>>> *at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)*
>>>
>>>
>>> mdc2vra179 is my Hbase cluster node, also running Namenode. Not sure why
>>> my spark expecting a jar file on hbase/Namenode.
>>> *$PIO_HOME/conf/pio-env.sh-*
>>>
>>> SPARK_HOME=$PIO_HOME/vendors/spark
>>>
>>> HBASE_CONF_DIR=$PIO_HOME/vendors/hbase/conf
>>>
>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>
>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>
>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>
>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>
>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>
>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>
>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>
>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>
>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>
>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>
>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>
>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pros-prod
>>>
>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=mdc2vra176
>>>
>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>
>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>
>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>
>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>>>
>>>
>>> Thanks,
>>>
>>> Malay
>>>
>>
>

Mime
View raw message