predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Szeto <don...@apache.org>
Subject Re: Eventserver API in an Engine?
Date Wed, 12 Jul 2017 16:53:21 GMT
Many good discussions. Let me provide my input on these issues.

Multiple installations of PredictionIO should use different database names.
An analogy would be Wordpress installations that expect its own metadata
database. I understand the downside to this is that some users only have
access to one database. We can add database table prefixing support to
alleviate this like most other projects do. I agree it is not very clear in
the documentation that installations of PIO should not be backed by
overlapping data stores.

Regarding the discussion of data and engine, here's what it seems to me:
two directions of data science development.

One perspective is that data collection and processing is independent from
data science development. Data are collected and organized ("apps" in PIO
term). Developers go look at what's available, explore, and develop
(engines).

The other one is to provide turnkey solutions. Well crafted engines expect
certain inputs and expose knobs for tuning.

PIO supports both styles today. Apps provide the grouping of data, and
engine is the abstraction to define the concern of data. These are well
defined from day 1.

Side track: a confusion I feel here is that templates have different degree
of sophistication. The universal recommender is definitely much more
sophisticated and turnkey than the skeleton template for example. We should
label this in our template gallery.

Going back to Mars suggestion. If the use case is such that the engine
server also collects data used by only the engine, it feels like the right
abstraction would be embedding a subset of event server that collects data
going to a single app. Recall that app name is configured in engine.json.

I think to resolve Mars immediate need, we can implement embedded event
server in a couple phases. Roughly it would be wiring the existing event
server in (with some refactoring) and mark it experimental, then continue
toward a clean, app-specific event server.

Let me know how these sound.

On Tue, Jul 11, 2017 at 1:39 PM Kenneth Chan <kenneth@apache.org> wrote:

> re:
> "
> when deploying multiple engines with different versions of PIO and
> different storage configurations ....
>
> needing separate PIO installs regularly when testing the next release or
> development builds of PIO and when evaluating engine templates or
> algorithms that require new, different storage configs. Also, those in the
> consulting world are frequently required to keep client data separated for
> all kinds of privacy & legal reasons; with the storage corruption bug I
> reported, one client's data could become visible to or intermingled with
> another client's app.
> "
>
> when install multiple PIO separately, could you set the each PIO DataBase
> config to use different table name so they don't conflict?
> or bring up another VM to isolate PIO?
>
> Donald, do you have best practice or advice if user want to install
> multiple PIO versions and able to run them in the same machine?
>
>
>
> On Tue, Jul 11, 2017 at 12:49 PM, Kenneth Chan <kenneth@apache.org> wrote:
>
>> I think we are having wrong impression that every template are supposed
>> to work together out of the box.
>>
>> The templates are meant to be examples and demonstration - that's why
>> they are called template! they are never meant to be fit into any user
>> application right away. Each application has its uniqueness. The template
>> only assume a specific use case for demonstration purpose.
>>
>> User can start with template for simple case but they need to modify for
>> their final needs.
>>
>> For example, the PIO classification template is only meant for
>> demonstrating simple classification. At the end, how to use classification
>> is application specific. For example, one can modify the classification to
>> train a classifier on the same set of data used by recommendation.
>>
>>
>>
>>
>> On Tue, Jul 11, 2017 at 10:31 AM, Pat Ferrel <pat@occamsmachete.com>
>> wrote:
>>
>>> Understood, you have immediate practical reasons for 1 integrated
>>> deployment with the 2 endpoints. But Apache is a do-ology, meaning those
>>> who do something win the argument as long as they have enough consensus. I
>>> have enough experience with PIO that I have chosen to fix a lot of issues
>>> with the prototype design, having already gone down the “quick hack” path
>>> once. You may want to do something else if you have the resources.
>>>
>>> I fear that my deeper changes will not get enough consensus and we may
>>> end up with a competing ML/AI server framework some day. That is another
>>> ASF tendency. Innovations happen before going into ASF, often not under ASF
>>> rules.
>>>
>>> In any case—how much of your problem is workflow vs installation vs
>>> bundling of APIs? Can you explain it more?
>>>
>>>
>>> On Jul 11, 2017, at 9:37 AM, Mars Hall <mars@heroku.com> wrote:
>>>
>>> > On Jul 10, 2017, at 18:03, Kenneth Chan <kenneth@apache.org> wrote:
>>> >
>>> > it's all same set of events collected for my application and i can
>>> create multiple engine to use these data for different purpose.
>>>
>>>
>>> Clear to me, ⬆️ this is the prevailing reasoning behind the
>>> "separateness" of the Eventserver. I do not foresake this design goal, but
>>> ask that we consider the usability & durability of PredictionIO when
>>> deploying multiple engines with different versions of PIO and different
>>> storage configurations. This will probably happen for anyone who uses
>>> PredictionIO long-term in production, as their new projects come on-line
>>> with newer & better versions & configurations.
>>>
>>> I encounter this situation of needing separate PIO installs regularly
>>> when testing the next release or development builds of PIO and when
>>> evaluating engine templates or algorithms that require new, different
>>> storage configs. Also, those in the consulting world are frequently
>>> required to keep client data separated for all kinds of privacy & legal
>>> reasons; with the storage corruption bug I reported, one client's data
>>> could become visible to or intermingled with another client's app.
>>>
>>> In starting this thread, I was hoping to find some traction with the
>>> idea of making it possible to completely self-contain a PredictionIO app by
>>> adding the Events API to the process started with `pio deploy`.
>>>
>>> Goal: Queries & Events APIs in the same process.
>>>
>>> When considering the architecture of apps, sharing a database between
>>> two or more apps is considered a very naughty way to get around having
>>> clear, clean, inter-process API's. My team at Salesforce/Heroku has been
>>> struck by this exact issue with PredictionIO. So, I am seeking a way to fix
>>> this without requiring a rewrite of PredictionIO. I am excited to hear
>>> about the new architecture prototypes, yet our reality is that this is an
>>> issue now.
>>>
>>> *Mars
>>>
>>> ( <> .. <> )
>>>
>>>
>>>
>>
>

Mime
View raw message