predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mars Hall <m...@heroku.com>
Subject Re: Eventserver API in an Engine?
Date Thu, 29 Jun 2017 16:57:22 GMT
Donald, Pat, great to hear that this is a well-pondered design challenge of PIO 😄 The prototype,
composable, all-in-one server sounds promising.

I'm wondering if there's a more immediate possibility to address adding the `/events` REST
API to Engine? Would it make sense to try invoking an `EventServiceActor` in the tools.commands.Engine#deploy
method? If that would be a distasteful hack, just say so. I'm trying to understand possibility
of solving this in the current codebase vs a visionary new version of PIO.

*Mars

( <> .. <> )

> On Jun 28, 2017, at 18:01, Pat Ferrel <pat@occamsmachete.com> wrote:
> 
> Ah, one of my favorite subjects.
> 
> I’m working on a prototype server that handles online learning as well as Lambda style.
There is only one server with everything going through REST. There are 2 resource types, Engines
and Commands. Engines have REST APIs with endpoints for Events and Queries. So something like
POST /engines/resouce-id/events would send an event to what is like a PIO app and POST /engine/resource-id/queries
does the PIO query equivalent. Note that this is fully multi-tenant and has only one important
id. It’s based on akka-http in a fully microservice type architecture. While the Server
is running you can add completely new Templates for any algorithm, thereby adding new endpoints
for Events and Queries. Each “tenant” is super lightweight since it’s just an Actor
not a new JVM. The CLI is actually Python that hits the REST API with a Python SDK, and there
is a Java SDK too. We support SSL and OAuth2 so having those baked into an SDK is really important.
Though a prototype it can support multi-tenant SaaS.
> 
> We have a prototype online learner Template which does not save events at all though
it ingests events exactly like PIO in the same format in fact we have the same template for
both servers taking identical input. Instead of an EventServer it mirrors received events
events before validation (yes we have full event validation that is template specific.) This
allows some events to affect mutable data in a database and some to just be an immutable stream
or even be thrown away for Kappa learners. For an online learner, each event updates the model,
which is stored periodically as a watermark. If you want to change algo params you destroy
the engine instance and replay the mirrored events. For a Lambda learner the Events may be
stored like PIO. 
> 
> This is very much along the lines of the proposal I put up for future PIO but the philosophy
internally is so different that I’m now not sure how it would fit. I’d love to talk about
it sometime and once we do a Lambda Template we’ll at least have some nice comparisons to
make. We migrated the Kappa style Template to it so we have a good idea that it’s not that
hard. I’d love to donate it to PIO but only if it makes sense.
> 
> 
> On Jun 28, 2017, at 4:27 PM, Donald Szeto <donald@apache.org> wrote:
> 
> Hey Mars,
> 
> Thanks for the suggestion and I agree with your point on the metadata part. Essentially
I think the app and channel concept should be instead logically grouped together with event,
not metadata.
> 
> I think in some advanced use cases, event storage should not even be a hard requirement
as engine templates can source data differently. In the long run, it might be cleaner to have
event server (and all relevant concepts such as its API, access keys, apps, etc) as a separable
package, that is by default turned on, embedded to engine server. Advanced users can either
make it standalone or even turn it off completely.
> 
> I imagine this kind of refactoring would echo Pat's proposal on making a clean and separate
engine and metadata management system down the road.
> 
> Regards,
> Donald
> 
> On Wed, Jun 28, 2017 at 3:29 PM Mars Hall <mars@heroku.com> wrote:
> One of the ongoing challenges we face with PredictionIO is the separation of Engine &
Eventserver APIs. This separation leads to several problems:
> 
> 1. Deploying a complete PredictionIO app requires multiple processes, each with its own
network listener
> 2. Eventserver & Engine must be configured to share exactly the same storage backends
(same `pio-env.sh`)
> 3. Confusion between "Eventserver" (an optional REST API) & "event storage" (a required
database)
> 
> These challenges are exacerbated by the fact that PredictionIO's docs & `pio app`
CLI make it appear that sharing an Eventserver between Engines is a good idea. I recently
filed a JIRA issue about this topic. TL;DR sharing an eventserver between engines with different
Meta Storage config will cause data corruption:
>   https://issues.apache.org/jira/browse/PIO-96
> 
> 
> I believe a lot of these issues could be alleviated with one change to PredictionIO core:
> 
> By default, expose the Eventserver API from the `pio deploy` Engine process, so that
it is not necessary to deploy a second Eventserver-only process. Separate `pio eventserver`
could still be optional if you need the separation of concerns for scalability.
> 
> 
> I'd love to hear what you folks think. I will file a JIRA enhancement issue if this seems
like an acceptable approach.
> 
> *Mars Hall
> Customer Facing Architect
> Salesforce Platform / Heroku
> San Francisco, California
> 
> 


Mime
View raw message