predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mars Hall <mars.h...@salesforce.com>
Subject Re: pio train on Amazon EMR
Date Mon, 05 Feb 2018 19:45:46 GMT
Hi Malik,

This is a topic I've been investigating as well.

Given how EMR manages its clusters & their runtime, I don't think hacking
configs to make the PredictionIO host act like a cluster member will be a
simple or sustainable approach.

PredictionIO already operates Spark by building `spark-submit` commands.

https://github.com/apache/predictionio/blob/df406bf92463da4a79c8d84ec0ca439feaa0ec7f/tools/src/main/scala/org/apache/predictionio/tools/Runner.scala#L313

Implementing a new AWS EMR command runner in PredictionIO, so that we can
switch `pio train` from the existing, plain `spark-submit` command to using
the AWS CLI, `aws emr add-steps --steps Args=spark-submit` would likely
solve a big part of this problem.
  https://docs.aws.amazon.com/cli/latest/reference/emr/add-steps.html

Also, uploading the engine assembly JARs (the job code to run on Spark) to
the cluster members or S3 for access from the EMR Spark runtime will be
another part of this challenge.

On Mon, Feb 5, 2018 at 5:29 AM, Malik Twain <chacha273@gmail.com> wrote:

> I'm trying to run pio train with Amazon EMR. I copied core-site.xml and
> yarn-site.xml from EMR to my training machine, and configured
> HADOOP_CONF_DIR in pio-env.sh accordingly.
>
> I'm running pio train as below:
>
> pio train -- --master yarn --deploy-mode cluster
>
> It's failing with the following errors:
>
> 18/02/05 11:56:15 INFO Client:
>    client token: N/A
>    diagnostics: Application application_1517819705059_0007 failed 2 times
> due to AM Container for appattempt_1517819705059_0007_000002 exited with
> exitCode: 1
> Diagnostics: Exception from container-launch.
>
> And below are the errors from EMR stdout and stderr respectively:
>
> java.io.FileNotFoundException: /root/pio.log (Permission denied)
>
> [ERROR] [CreateWorkflow$] Error reading from file: File file:/quickstartapp/MyExample/engine.json
does not exist. Aborting workflow.
>
>
> Thank you.
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Mime
View raw message