predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Jones <>
Subject Multitenancy on the Universal Product Recommender
Date Thu, 08 Sep 2016 23:08:07 GMT
Hi All,

I have a use case where I have events coming in from many seperate tenants
and I want to use the Universal Product Recommender engine. The challenge
is separating data from each tenant throughout the PIO process.

I can think of three possible ways to solve this issue, but they all have

*1) Create Multiple Apps*

You have one app per tenant. When you create events, you use the access key
specific to that tenant. Then you query for recommendations using that same
access key to get recommendations for just that app.

Issue: each engine has to specify an “appName” in engine.json. So now you
have to have an engine per tenant (AKA app) that has all the same source
code except for the “appName” will be different.

This’ll result in a bunch of duplicated code and you’ll have to train and
deploy each one individually.

There is also no API for creating apps, so something will need to be
created to bridge that to allow a new tenant to be on boarded.

*2) Use Channels*

You create one app, but create a channel per tenant. When you create an
event you specific the channel.

Issue: the Universal Recommender engine can be modified to look at data for
a single channel name but that name cannot be dynamically queried, it’ll be
hardcoded into DataSource.scala. So now you’re in this same situation where
you’ll need to create one engine per tenant, where each engine has the
exact same source code except a one line change in the DataSource.scala

*3) Use Product Properties*

Provided your user ids are unique over all tenants, you could set a
property on each product with a tenant id.

This way you can use one app, one engine, and simply query for
recommendations and supply a significant bias to products that contain the
tenant id property.

Example, give me the top recommendations for user xyz who is on tenant_id

  "user": "xyz",
  "fields": [
      "name": tenant_id",
      "values": ["12"],
      "bias": 10

Issues: since all the data for all tenants is in one place, you’re going to
have to train over all tenant’s data each time. There’s also issues around
risk of deleting data from the wrong tenant should a tenant leave.

I was wondering if anyone has done something to any of these options?
Perhaps there are other options? Are there any better ones? I’m thinking
option 3) might be the best for our needs.


View raw message