predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Unclear problem with using S3 as a storage data source
Date Wed, 28 Mar 2018 22:01:23 GMT
So you need to have more Spark nodes and this is the problem?

If so setup HBase on pseudo-clustered HDFS so you have a master node
address even though all storage is on one machine. Then you use that
version of HDFS to tell Spark where to look for the model. It give the
model a URI.

I have never used the raw S3 support, HDFS can also be backed by S3 but you
use HDFS APIs, it is an HDFS config setting to use S3.

It is a rather unfortunate side effect of PIO but there are 2 ways to solve
this with no extra servers.

Maybe someone else knows how to use S3 natively for the model stub?


From: Dave Novelli <dave@ultravioletanalytics.com>
<dave@ultravioletanalytics.com>
Date: March 28, 2018 at 12:13:12 PM
To: Pat Ferrel <pat@occamsmachete.com> <pat@occamsmachete.com>
Cc: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Subject:  Re: Unclear problem with using S3 as a storage data source

Well, it looks like the local file system isn't an option in a multi-server
configuration without manually setting up a process to transfer those stub
model files.

I trained models on one heavy-weight temporary instance, and then when I
went to deploy from the prediction server instance it failed due to missing
files. I copied the .pio_store/models directory from the training server
over to the prediction server and then was able to deploy.

So, in a dual-instance configuration what's the best way to store the
files? I'm using pseudo-distributed HBase with standard file system storage
instead of HDFS (my current aim is keeping down cost and complexity for a
pilot project).

Is S3 back on the table as on option?

On Fri, Mar 23, 2018 at 11:03 AM, Dave Novelli <
dave@ultravioletanalytics.com> wrote:

> Ahhh ok, thanks Pat!
>
>
> Dave Novelli
> Founder/Principal Consultant, Ultraviolet Analytics
> www.ultravioletanalytics.com | 919.210.0948 <(919)%20210-0948> |
> dave@ultravioletanalytics.com
>
> On Fri, Mar 23, 2018 at 8:08 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
>
>> There is no need to have Universal Recommender models put in S3, they are
>> not used and only exist (in stub form) because PIO requires them. The
>> actual model lives in Elasticsearch and uses special features of ES to
>> perform the last phase of the algorithm and so cannot be replaced.
>>
>> The stub PIO models have no data and will be tiny. putting them in HDFS
>> or the local file system is recommended.
>>
>>
>> From: Dave Novelli <dave@ultravioletanalytics.com>
>> <dave@ultravioletanalytics.com>
>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Date: March 22, 2018 at 6:17:32 PM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  Unclear problem with using S3 as a storage data source
>>
>> Hi all,
>>
>> I'm using the Universal Recommender template and I'm trying to switch
>> storage data sources from local file to S3 for the model repository. I've
>> read the page at https://predictionio.apache.org/system/anotherdatastore/
>> to try to understand the configuration requirements, but when I run pio
>> train it's indicating an error and nothing shows up in the s3 bucket:
>>
>> [ERROR] [S3Models] Failed to insert a model to
>> s3://pio-model/pio_modelAWJPjTYM0wNJe2iKBl0d
>>
>> I created a new bucket named "pio-model" and granted full public
>> permissions.
>>
>> Seemingly relevant settings from pio-env.sh:
>>
>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=S3
>> ...
>>
>> PIO_STORAGE_SOURCES_S3_TYPE=s3
>> PIO_STORAGE_SOURCES_S3_REGION=us-west-2
>> PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio-model
>>
>> # I've tried with and without this
>> #PIO_STORAGE_SOURCES_S3_ENDPOINT=http://s3.us-west-2.amazonaws.com
>>
>> # I've tried with and without this
>> #PIO_STORAGE_SOURCES_S3_BASE_PATH=pio-model
>>
>>
>> Any suggestions where I can start troubleshooting my configuration?
>>
>> Thanks,
>> Dave
>>
>>
>


--
Dave Novelli
Founder/Principal Consultant, Ultraviolet Analytics
www.ultravioletanalytics.com | 919.210.0948 | dave@ultravioletanalytics.com

Mime
View raw message