Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@predictionio.incubator.apache.org
From: Pat Ferrel <pat@occamsmachete.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8"
Message-Id: <F3DF6E50-8D35-4EE2-B87D-AC9977D25BF6@occamsmachete.com>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Multitenancy on the Universal Product Recommender
Date: Fri, 9 Sep 2016 11:44:34 -0700
References: <CAMw1vkQ5ohgwAzht=oAJh-Cd+P2x-x542jnW3+t3u4P1b9ZDvQ@mail.gmail.com> <3E10BD45-295E-4852-A795-556AA7675A50@occamsmachete.com> <CADPiWbVbf=kSJyS7UTw6D3FZ5TX1w3ngf85ZPvYjzPphN-L5_A@mail.gmail.com>
To: user@predictionio.incubator.apache.org
In-Reply-To: <CADPiWbVbf=kSJyS7UTw6D3FZ5TX1w3ngf85ZPvYjzPphN-L5_A@mail.gmail.com>
archived-at: Fri, 09 Sep 2016 18:44:44 -0000


--Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

As I said below, best to send me a private message, the feature is not =
in the Apache version. Or make a feature request by creating a Jira for =
PIO.


On Sep 9, 2016, at 10:03 AM, Dipen Patel <patelndipen@gmail.com> wrote:

Could you please provide links to resources on the PIO that supports =
multi-tenancy with lightweight Actors one per tenant.=20

On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <pat@occamsmachete.com =
<mailto:pat@occamsmachete.com>> wrote:
I=E2=80=99m the maintainer of the Universal Recommender. We have OSS =
support at https://groups.google.com/forum/#!forum/actionml-user =
<https://groups.google.com/forum/#!forum/actionml-user>

Do you wish to take advantage of the same user being in multiple =
datasets/tenants? The answer below is assuming no.

There are several ways to do this. First the PIO EventServer is =
multi-tenant, just keep data in separate =E2=80=9Capps=E2=80=9D which =
really should be named =E2=80=9Cdatasets=E2=80=9D they are IDed by keys =
generated when you do `pio app new <your-app-name>

The PredictionServer is not multi-tenant but you can put a separate =
process on different ports. You would train each tenant from a different =
directory containing the UR and the correct engine.json for that =
tenant/dataset. Then deploy it on some port that is specific to the =
tenant/model. This will create somewhat heavyweight processes for each =
port.

We have a version of PIO that supports multi-tenancy with lightweight =
Actors one per tenant. You deploy with a resource-id and when you make =
queries include the REST resource id in the URI. All engines are on the =
same port running in the same process so it=E2=80=99s very light-weight =
and performant. Otherwise the query works the same. Private message me =
to hear more.

I would not advise the item property method, unless you know there is no =
overlap in user-ids it may produce undesired results in the model and =
these may leak into recommendations. You can solve that with a filter =
(instead of the boost below) but there are better ways to solve this.


On Sep 8, 2016, at 4:08 PM, David Jones <dave@resolvedigital.com =
<mailto:dave@resolvedigital.com>> wrote:

Hi All,

I have a use case where I have events coming in from many seperate =
tenants and I want to use the Universal Product Recommender engine. The =
challenge is separating data from each tenant throughout the PIO =
process.

I can think of three possible ways to solve this issue, but they all =
have tradeoffs:

1) Create Multiple Apps

You have one app per tenant. When you create events, you use the access =
key specific to that tenant. Then you query for recommendations using =
that same access key to get recommendations for just that app.

Issue: each engine has to specify an =E2=80=9CappName=E2=80=9D in =
engine.json. So now you have to have an engine per tenant (AKA app) that =
has all the same source code except for the =E2=80=9CappName=E2=80=9D =
will be different.

This=E2=80=99ll result in a bunch of duplicated code and you=E2=80=99ll =
have to train and deploy each one individually.

There is also no API for creating apps, so something will need to be =
created to bridge that to allow a new tenant to be on boarded.

2) Use Channels

You create one app, but create a channel per tenant. When you create an =
event you specific the channel.

Issue: the Universal Recommender engine can be modified to look at data =
for a single channel name but that name cannot be dynamically queried, =
it=E2=80=99ll be hardcoded into DataSource.scala. So now you=E2=80=99re =
in this same situation where you=E2=80=99ll need to create one engine =
per tenant, where each engine has the exact same source code except a =
one line change in the DataSource.scala file.

3) Use Product Properties

Provided your user ids are unique over all tenants, you could set a =
property on each product with a tenant id.

This way you can use one app, one engine, and simply query for =
recommendations and supply a significant bias to products that contain =
the tenant id property.

Example, give me the top recommendations for user xyz who is on =
tenant_id 12.

{
  "user": "xyz",
  "fields": [
    {
      "name": tenant_id",
      "values": ["12"],
      "bias": 10
    }
  ]
}

Issues: since all the data for all tenants is in one place, you=E2=80=99re=
 going to have to train over all tenant=E2=80=99s data each time. =
There=E2=80=99s also issues around risk of deleting data from the wrong =
tenant should a tenant leave.

-
I was wondering if anyone has done something to any of these options? =
Perhaps there are other options? Are there any better ones? I=E2=80=99m =
thinking option 3) might be the best for our needs.

Thanks,
David.


--Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">As I said below, best to send me a private message, the =
feature is not in the Apache version. Or make a feature request by =
creating a Jira for PIO.<br class=3D""><div><div class=3D""><br =
class=3D""></div><div class=3D""><br class=3D""></div><div class=3D"">On =
Sep 9, 2016, at 10:03 AM, Dipen Patel &lt;<a =
href=3D"mailto:patelndipen@gmail.com" =
class=3D"">patelndipen@gmail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D"">Could you please provide links to resources on the PIO that =
supports multi-tenancy with lightweight Actors one per tenant. <br =
class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div =
class=3D"gmail_quote">On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <span =
dir=3D"ltr" class=3D"">&lt;<a href=3D"mailto:pat@occamsmachete.com" =
target=3D"_blank" class=3D"">pat@occamsmachete.com</a>&gt;</span> =
wrote:<br class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word" class=3D"">I=E2=80=99m the maintainer of =
the Universal Recommender. We have OSS support at&nbsp;<a =
href=3D"https://groups.google.com/forum/#!forum/actionml-user" =
target=3D"_blank" class=3D"">https://groups.google.com/<wbr =
class=3D"">forum/#!forum/actionml-user</a><div class=3D""><br =
class=3D""></div><div class=3D"">Do you wish to take advantage of the =
same user being in multiple datasets/tenants? The answer below is =
assuming no.</div><div class=3D""><br class=3D""></div><div =
class=3D"">There are several ways to do this. First the PIO EventServer =
is multi-tenant, just keep data in separate =E2=80=9Capps=E2=80=9D which =
really should be named =E2=80=9Cdatasets=E2=80=9D they are IDed by keys =
generated when you do `pio app new &lt;your-app-name&gt;</div><div =
class=3D""><br class=3D""></div><div class=3D"">The PredictionServer is =
not multi-tenant but you can put a separate process on different ports. =
You would train each tenant from a different directory containing the UR =
and the correct engine.json for that tenant/dataset. Then deploy it on =
some port that is specific to the tenant/model. This will create =
somewhat heavyweight processes for each port.</div><div class=3D""><br =
class=3D""></div><div class=3D"">We have a version of PIO that supports =
multi-tenancy with lightweight Actors one per tenant. You deploy with a =
resource-id and when you make queries include the REST resource id in =
the URI. All engines are on the same port running in the same process so =
it=E2=80=99s very light-weight and performant. Otherwise the query works =
the same. Private message me to hear more.</div><div class=3D""><br =
class=3D""></div><div class=3D"">I would not advise the item property =
method, unless you know there is no overlap in user-ids it may produce =
undesired results in the model and these may leak into recommendations. =
You can solve that with a filter (instead of the boost below) but there =
are better ways to solve this.<div class=3D""><div class=3D"h5"><br =
class=3D""><div class=3D""><br class=3D""></div><div class=3D""><br =
class=3D""><div class=3D""><div class=3D"">On Sep 8, 2016, at 4:08 PM, =
David Jones &lt;<a href=3D"mailto:dave@resolvedigital.com" =
target=3D"_blank" class=3D"">dave@resolvedigital.com</a>&gt; =
wrote:</div><br class=3D""><div class=3D""><div dir=3D"ltr" class=3D"">Hi =
All,<div class=3D""><br class=3D""></div><div class=3D"">I have a use =
case where I have events coming in from many seperate tenants and I want =
to use the Universal Product Recommender engine. The challenge is =
separating data from each tenant throughout the PIO process.</div><div =
class=3D""><br class=3D""></div><div class=3D"">I can think of three =
possible ways to solve this issue, but they all have =
tradeoffs:</div><div class=3D""><br class=3D""></div><div class=3D""><b =
class=3D"">1) Create Multiple Apps</b></div><div class=3D""><br =
class=3D""></div><div class=3D"">You have one app per tenant. When you =
create events, you use the access key specific to that tenant. Then you =
query for recommendations using that same access key to get =
recommendations for just that app.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Issue: each engine has to specify an =
=E2=80=9CappName=E2=80=9D in engine.json. So now you have to have an =
engine per tenant (AKA app) that has all the same source code except for =
the =E2=80=9CappName=E2=80=9D will be different.</div><div class=3D""><br =
class=3D""></div><div class=3D"">This=E2=80=99ll result in a bunch of =
duplicated code and you=E2=80=99ll have to train and deploy each one =
individually.</div><div class=3D""><br class=3D""></div><div =
class=3D"">There is also no API for creating apps, so something will =
need to be created to bridge that to allow a new tenant to be on =
boarded.</div><div class=3D""><br class=3D""></div><div class=3D""><b =
class=3D"">2) Use Channels</b></div><div class=3D""><b class=3D""><br =
class=3D""></b></div><div class=3D"">You create one app, but create a =
channel per tenant. When you create an event you specific the =
channel.</div><div class=3D""><br class=3D""></div><div class=3D"">Issue: =
the Universal Recommender engine can be modified to look at data for a =
single channel name but that name cannot be dynamically queried, it=E2=80=99=
ll be hardcoded into DataSource.scala. So now you=E2=80=99re in this =
same situation where you=E2=80=99ll need to create one engine per =
tenant, where each engine has the exact same source code except a one =
line change in the DataSource.scala file.</div><div class=3D""><br =
class=3D""></div><div class=3D""><b class=3D"">3) Use Product =
Properties</b></div><div class=3D""><br class=3D""></div><div =
class=3D"">Provided your user ids are unique over all tenants, you could =
set a property on each product with a tenant id.</div><div class=3D""><br =
class=3D""></div><div class=3D"">This way you can use one app, one =
engine, and simply query for recommendations and supply a significant =
bias to products that contain the tenant id property.</div><div =
class=3D""><br class=3D""></div><div class=3D"">Example, give me the top =
recommendations for user xyz who is on tenant_id 12.</div><div =
class=3D""><br class=3D""></div><div class=3D""><div =
class=3D"">{</div><div class=3D"">&nbsp; "user": "xyz",</div><div =
class=3D"">&nbsp; "fields": [</div><div class=3D"">&nbsp; &nbsp; =
{</div><div class=3D"">&nbsp; &nbsp; &nbsp; "name": =
tenant_id",</div><div class=3D"">&nbsp; &nbsp; &nbsp; "values": =
["12"],</div><div class=3D"">&nbsp; &nbsp; &nbsp; "bias": 10</div><div =
class=3D"">&nbsp; &nbsp; }</div><div class=3D"">&nbsp; ]</div><div =
class=3D"">}</div></div><div class=3D""><br class=3D""></div><div =
class=3D"">Issues: since all the data for all tenants is in one place, =
you=E2=80=99re going to have to train over all tenant=E2=80=99s data =
each time. There=E2=80=99s also issues around risk of deleting data from =
the wrong tenant should a tenant leave.</div><div class=3D""><br =
class=3D""></div><div class=3D"">-</div><div class=3D"">I was wondering =
if anyone has done something to any of these options? Perhaps there are =
other options? Are there any better ones? I=E2=80=99m thinking option 3) =
might be the best for our needs.<br class=3D""></div><div class=3D""><br =
class=3D""></div><div class=3D"">Thanks,</div><div =
class=3D"">David.</div></div>
</div></div><br =
class=3D""></div></div></div></div></div></blockquote></div><br =
class=3D""></div>
</div></div><br class=3D""></body></html>=

--Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8--