predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Multitenancy on the Universal Product Recommender
Date Fri, 09 Sep 2016 18:44:34 GMT
As I said below, best to send me a private message, the feature is not in the Apache version.
Or make a feature request by creating a Jira for PIO.

On Sep 9, 2016, at 10:03 AM, Dipen Patel <> wrote:

Could you please provide links to resources on the PIO that supports multi-tenancy with lightweight
Actors one per tenant. 

On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel < <>>
I’m the maintainer of the Universal Recommender. We have OSS support at!forum/actionml-user

Do you wish to take advantage of the same user being in multiple datasets/tenants? The answer
below is assuming no.

There are several ways to do this. First the PIO EventServer is multi-tenant, just keep data
in separate “apps” which really should be named “datasets” they are IDed by keys generated
when you do `pio app new <your-app-name>

The PredictionServer is not multi-tenant but you can put a separate process on different ports.
You would train each tenant from a different directory containing the UR and the correct engine.json
for that tenant/dataset. Then deploy it on some port that is specific to the tenant/model.
This will create somewhat heavyweight processes for each port.

We have a version of PIO that supports multi-tenancy with lightweight Actors one per tenant.
You deploy with a resource-id and when you make queries include the REST resource id in the
URI. All engines are on the same port running in the same process so it’s very light-weight
and performant. Otherwise the query works the same. Private message me to hear more.

I would not advise the item property method, unless you know there is no overlap in user-ids
it may produce undesired results in the model and these may leak into recommendations. You
can solve that with a filter (instead of the boost below) but there are better ways to solve

On Sep 8, 2016, at 4:08 PM, David Jones < <>>

Hi All,

I have a use case where I have events coming in from many seperate tenants and I want to use
the Universal Product Recommender engine. The challenge is separating data from each tenant
throughout the PIO process.

I can think of three possible ways to solve this issue, but they all have tradeoffs:

1) Create Multiple Apps

You have one app per tenant. When you create events, you use the access key specific to that
tenant. Then you query for recommendations using that same access key to get recommendations
for just that app.

Issue: each engine has to specify an “appName” in engine.json. So now you have to have
an engine per tenant (AKA app) that has all the same source code except for the “appName”
will be different.

This’ll result in a bunch of duplicated code and you’ll have to train and deploy each
one individually.

There is also no API for creating apps, so something will need to be created to bridge that
to allow a new tenant to be on boarded.

2) Use Channels

You create one app, but create a channel per tenant. When you create an event you specific
the channel.

Issue: the Universal Recommender engine can be modified to look at data for a single channel
name but that name cannot be dynamically queried, it’ll be hardcoded into DataSource.scala.
So now you’re in this same situation where you’ll need to create one engine per tenant,
where each engine has the exact same source code except a one line change in the DataSource.scala

3) Use Product Properties

Provided your user ids are unique over all tenants, you could set a property on each product
with a tenant id.

This way you can use one app, one engine, and simply query for recommendations and supply
a significant bias to products that contain the tenant id property.

Example, give me the top recommendations for user xyz who is on tenant_id 12.

  "user": "xyz",
  "fields": [
      "name": tenant_id",
      "values": ["12"],
      "bias": 10

Issues: since all the data for all tenants is in one place, you’re going to have to train
over all tenant’s data each time. There’s also issues around risk of deleting data from
the wrong tenant should a tenant leave.

I was wondering if anyone has done something to any of these options? Perhaps there are other
options? Are there any better ones? I’m thinking option 3) might be the best for our needs.


View raw message