predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: PredictionIO with remote Spark and Elasticsearch
Date Thu, 02 Mar 2017 22:45:13 GMT
1) circumvent what?
2) transportclient port to what?

On Mar 2, 2017, at 2:04 PM, Paul-Armand Verhaegen <paularmand.verhaegen@gmail.com> wrote:

We went to elastic.co <http://elastic.co/> to circumvent that. They are also on AWS
but have the transportclient port.


> On 2 Mar 2017, at 23:02, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
> 
> Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO has a feature
branch that adds support for ES5 using only the REST client. Not sure this will help though
since I suspect AWS is not on ES5 yet.
> 
> 
> On Mar 2, 2017, at 1:10 PM, Miller, Clifford <clifford.miller@phoenix-opsgroup.com
<mailto:clifford.miller@phoenix-opsgroup.com>> wrote:
> 
> I found some old references of folks having the same issue as me.  They indicated that
the AWS Elasticsearch Service only supports HTTP and not TCP.  If this is true then it means
that AWS Elasticsearch has very limited usefulness.  Has anyone else ran into this?
> 
> 
> On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford <clifford.miller@phoenix-opsgroup.com
<mailto:clifford.miller@phoenix-opsgroup.com>> wrote:
> I'm able run pio train although the pio train -- --master spark://your_master_url <spark://your_master_url>
did not work.  I'm using Spark on Yarn so I was able to get pio train -- --master yarn://URL
<yarn://URL> to work after I copied the elastic search configuration from my CDH cluster.
> 
> I'm still struggling with integrating this with AWS elasticsearch.  Does anyone have
an example of how this should be configured.  
> 
> FYI, the EC2 instance that I'm running PredictionIO on can access it from the command
line: "curl -X GET <AWS Elasticsearch endpoint URL>". 
>  
> 
> On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto <donald@apache.org <mailto:donald@apache.org>>
wrote:
> Hi Clifford,
> 
> To use a remote Spark cluster, use passthrough command line arguments on the CLI, e.g.
> 
> pio train -- --master spark://your_master_url <spark://your_master_url>
> 
> Anything after a lone -- will be passed to spark-submit verbatim. For more information
try "pio help".
> 
> To use a remote Elasticsearch cluster, please refer to examples in "conf/pio-env.sh"
where you could find a variable to set the remote host name or IP of your ES cluster.
> 
> Regards,
> Donald
> 
> On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford <clifford.miller@phoenix-opsgroup.com
<mailto:clifford.miller@phoenix-opsgroup.com>> wrote:
> I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS.  I have PredictionIO
installed on a different EC2 instance.  I've been able to successfully configure it to use
HDFS for model storage and to store events in Hbase from the cluster.  Spark and Elasticsearch
are installed locally on the PredictionIO EC2 instance.  I have the following questions:
> 
> How can I configure PredictionIO to utilize the Spark on the Cloudera cluster?  
> How can I configure PredictionIO to utilize a remote Elasticsearch domain?  I'd like
to use the AWS Elasticsearch service if possible.
> 
> Thanks
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
> 
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
> 
> 
> 
> -- 
> Clifford Miller
> Mobile | 321.431.9089 <tel:321.431.9089>
> 



Mime
View raw message