predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clifford Miller <clifford.mil...@phoenix-opsgroup.com>
Subject Re: Data import, HBase requirements, and cost savings ?
Date Tue, 10 Apr 2018 22:15:32 GMT
Also, currently we are using the Similar Product Recommender and the
ECommerce recommender.

-- Cliff.

On Tue, Apr 10, 2018, 18:12 Clifford Miller <
clifford.miller@phoenix-opsgroup.com> wrote:

> Thanks for the responses.  They are very helpful.  We are currently using
> the Event server for event ingest.
>
> --Cliff.
>
> On Tue, Apr 10, 2018, 16:52 Donald Szeto <donald@apache.org> wrote:
>
>> Hey Cliff, how are you collecting your events? Is it through PIO's Event
>> Server, or generated somehow by another ETL process?
>>
>> Regards,
>> Donald
>>
>> On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <pat@occamsmachete.com>
>> wrote:
>>
>>> It depends on what templates you are using. For instance the
>>> recommenders require queries to the EventStore to get user history so this
>>> will not work for them. Some templates do not require Spark to be running
>>> at scale except for the training phase (The Universal Recommender for
>>> instance) so for that template it is much more cost-effective to stop Spark
>>> when not using it.
>>>
>>> Every template uses the PIO framework in different ways. Dropping the DB
>>> is not likely to work, especially if you are using it to store engine
>>> metadata.
>>>
>>> We’d need to know what templates you are using to advise cost savings.
>>>
>>> From: Miller, Clifford <clifford.miller@phoenix-opsgroup.com>
>>> <clifford.miller@phoenix-opsgroup.com>
>>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>>> <user@predictionio.apache.org>
>>> Date: April 10, 2018 at 11:22:04 AM
>>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>>> <user@predictionio.apache.org>
>>> Subject:  Data import, HBase requirements, and cost savings ?
>>>
>>> I'm exploring cost saving options for a customer that is wanting to
>>> utilize PredictionIO.  We plan on running multiple engines/templates.  We
>>> are planning on running everything in AWS and are hoping to not have all
>>> data loaded for all templates at once.  The hope is to:
>>>
>>>    1. start up the HBase cluster.
>>>    2. Import the events.
>>>    3. Train the model
>>>    4. then store the model in S3.
>>>    5. Then shutdown HBase cluster
>>>
>>> We have some general questions.
>>>
>>>    1. Is this approach even feasible?
>>>    2. Does PredictionIO require the Event Store (HBase) to be up and
>>>    running constantly or can we turn it off when not training?  If it requires
>>>    HBase constantly can we do the training from a different HBase cluster and
>>>    then have separate PIO Event/Engine servers to deploy the applications
>>>    using the model generated by the larger Hbase cluster?
>>>    3. Can the events be stored in S3 and then imported in (pio import)
>>>    when needed for training? or will we have to copy them out of S3 to our PIO
>>>    Event/Engine server?
>>>    4. Has any import benchmarks been done?  Events per second or MB/GB
>>>    per second?
>>>
>>> Any assistance would be appreciated.
>>>
>>> --Cliff.
>>>
>>>
>>>
>>>
>>

Mime
View raw message