predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: PredictionIO with remote Spark and Elasticsearch
Date Thu, 02 Mar 2017 22:43:30 GMT
I think it will be released with the upcoming release. We are still deciding how or if we modify
the sbt build so I’d wait, if you can. It’s in feature/es5 but the config is also still
in flux a bit.

On Mar 2, 2017, at 2:15 PM, Miller, Clifford <>

I probably should have asked if the elasticsearch 5.x compatible branch was in a state that
I could clone and build it.  If it is, where can I find it?

On Thu, Mar 2, 2017 at 5:06 PM, Miller, Clifford <
<>> wrote:
Actually, AWS has 3 current options.  1.5, 2.3, and 5.1.  So a 5.x compatible version should
work.  When will this 5.x compatible version be available?

On Thu, Mar 2, 2017 at 5:02 PM, Pat Ferrel < <>>
Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO has a feature branch
that adds support for ES5 using only the REST client. Not sure this will help though since
I suspect AWS is not on ES5 yet.

On Mar 2, 2017, at 1:10 PM, Miller, Clifford < <>>

I found some old references of folks having the same issue as me.  They indicated that the
AWS Elasticsearch Service only supports HTTP and not TCP.  If this is true then it means that
AWS Elasticsearch has very limited usefulness.  Has anyone else ran into this?

On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford <
<>> wrote:
I'm able run pio train although the pio train -- --master spark://your_master_url <>
did not work.  I'm using Spark on Yarn so I was able to get pio train -- --master yarn://URL
<> to work after I copied the elastic search configuration from my CDH cluster.

I'm still struggling with integrating this with AWS elasticsearch.  Does anyone have an example
of how this should be configured.  

FYI, the EC2 instance that I'm running PredictionIO on can access it from the command line:
"curl -X GET <AWS Elasticsearch endpoint URL>". 

On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto < <>>
Hi Clifford,

To use a remote Spark cluster, use passthrough command line arguments on the CLI, e.g.

pio train -- --master spark://your_master_url <>

Anything after a lone -- will be passed to spark-submit verbatim. For more information try
"pio help".

To use a remote Elasticsearch cluster, please refer to examples in "conf/" where
you could find a variable to set the remote host name or IP of your ES cluster.


On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford <
<>> wrote:
I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS.  I have PredictionIO
installed on a different EC2 instance.  I've been able to successfully configure it to use
HDFS for model storage and to store events in Hbase from the cluster.  Spark and Elasticsearch
are installed locally on the PredictionIO EC2 instance.  I have the following questions:

How can I configure PredictionIO to utilize the Spark on the Cloudera cluster?  
How can I configure PredictionIO to utilize a remote Elasticsearch domain?  I'd like to use
the AWS Elasticsearch service if possible.


Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

View raw message