predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Eventserver API in an Engine?
Date Tue, 11 Jul 2017 17:46:36 GMT
One of my biggest work pieces is scaling of deployments including all services to fit the data.
Then once this is accomplished changing that as data changes or fine tuning to reduce costs.
upgrading has not be difficult for the PIO part but certainly is when upgrading storage backends
or heaven forbid, changing the type of storage backend.

I’d love to see this addressed. We have Chef recipes and a collection of docker containers,
all in OSS, as well as some Terraform scripts for spinning up AWS that is closed source. Would
some form of this help? I know Heroku has its own install system.


On Jul 11, 2017, at 10:31 AM, Pat Ferrel <pat@occamsmachete.com> wrote:

Understood, you have immediate practical reasons for 1 integrated deployment with the 2 endpoints.
But Apache is a do-ology, meaning those who do something win the argument as long as they
have enough consensus. I have enough experience with PIO that I have chosen to fix a lot of
issues with the prototype design, having already gone down the “quick hack” path once.
You may want to do something else if you have the resources.

I fear that my deeper changes will not get enough consensus and we may end up with a competing
ML/AI server framework some day. That is another ASF tendency. Innovations happen before going
into ASF, often not under ASF rules.

In any case—how much of your problem is workflow vs installation vs bundling of APIs? Can
you explain it more?


On Jul 11, 2017, at 9:37 AM, Mars Hall <mars@heroku.com> wrote:

> On Jul 10, 2017, at 18:03, Kenneth Chan <kenneth@apache.org> wrote:
> 
> it's all same set of events collected for my application and i can create multiple engine
to use these data for different purpose.


Clear to me, ⬆️ this is the prevailing reasoning behind the "separateness" of the Eventserver.
I do not foresake this design goal, but ask that we consider the usability & durability
of PredictionIO when deploying multiple engines with different versions of PIO and different
storage configurations. This will probably happen for anyone who uses PredictionIO long-term
in production, as their new projects come on-line with newer & better versions & configurations.

I encounter this situation of needing separate PIO installs regularly when testing the next
release or development builds of PIO and when evaluating engine templates or algorithms that
require new, different storage configs. Also, those in the consulting world are frequently
required to keep client data separated for all kinds of privacy & legal reasons; with
the storage corruption bug I reported, one client's data could become visible to or intermingled
with another client's app.

In starting this thread, I was hoping to find some traction with the idea of making it possible
to completely self-contain a PredictionIO app by adding the Events API to the process started
with `pio deploy`.

Goal: Queries & Events APIs in the same process.

When considering the architecture of apps, sharing a database between two or more apps is
considered a very naughty way to get around having clear, clean, inter-process API's. My team
at Salesforce/Heroku has been struck by this exact issue with PredictionIO. So, I am seeking
a way to fix this without requiring a rewrite of PredictionIO. I am excited to hear about
the new architecture prototypes, yet our reality is that this is an issue now.

*Mars

( <> .. <> )




Mime
View raw message