Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9BE8C200B6B for ; Fri, 9 Sep 2016 20:44:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9A5FD160AC2; Fri, 9 Sep 2016 18:44:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 959B2160AA3 for ; Fri, 9 Sep 2016 20:44:43 +0200 (CEST) Received: (qmail 21955 invoked by uid 500); 9 Sep 2016 18:44:42 -0000 Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.incubator.apache.org Delivered-To: mailing list user@predictionio.incubator.apache.org Received: (qmail 21945 invoked by uid 99); 9 Sep 2016 18:44:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2016 18:44:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 104E21A60AC for ; Fri, 9 Sep 2016 18:44:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id o2yOnDIK_BrC for ; Fri, 9 Sep 2016 18:44:39 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3FAF65F1E9 for ; Fri, 9 Sep 2016 18:44:39 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id 1so48141811wmz.1 for ; Fri, 09 Sep 2016 11:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:references:to:in-reply-to; bh=+dOOrv2wFcFER2TaxqzfDcfTOHWYbY3UxHlM6b4Cxkc=; b=KuE4Ko9gaqtiP68D8w8q5kHzkRujKNWl/i0kIN6g04RTEkLkdyhxm1xD2eLhonavu+ kQBSksXNFTme+4NVUEgUp2aDk9pPMs9a0ckN+JHerIyp7gx6XqWBfVP0XPfusr59+5ve P7UgjQvqNoXFXcMJdVwLE1ufT8/jIJeeoO0iwR/5EDD6jFoSankHlr3bgk7hZVMpaRI7 yeNRKI/6RLxe+/EvAKbQ9xVH3lQuNoDvXgdP5S9T/suk6jegT0PhzB4JygtSuA1tcvq5 Ual50lLaTaQQscO99JUAm8Zft9G7NRVG6hptDQFtcxalrGzze8Q7EPtW+uYyFE/DnphJ vHHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:message-id:mime-version:subject:date :references:to:in-reply-to; bh=+dOOrv2wFcFER2TaxqzfDcfTOHWYbY3UxHlM6b4Cxkc=; b=TD4Wwu4FX6L7VEbEbgKUuu/fcFY9/b8rwaNMl73IA7syU8jp8FBWF5fNQcAoG6mIVO eE1vng41sLlCDFnQJOe9Tujc3A9Dc8dmduHGRypsi8iGhfgA7E/JVlFTQgxKIQhEJ6GG BTW+6idRBKtbfpi13L3s/QhIr1M0+NMjozJ7BBqdepgjmFrFkCuKDLSy/Pm1B55WvnN+ fE9VQiGBxZLg4Gm+TGvBq0Ebq43KQKhdThFI87quy/1BA5b297G4mKzMjMCN171xO2s1 EPBogldHfO62LDUQ6zKnaypX+i6hYZpcbnXt2DiT9oXolmUIGmHEGV1SH03PvEngcc6O 7tfA== X-Gm-Message-State: AE9vXwMvUgZSUiGlFy8xTf96QfRcdVH0Q5SJeMirs/oTktI6r8rJf0n6pkxODjamecJXCA== X-Received: by 10.194.148.99 with SMTP id tr3mr5072654wjb.173.1473446678549; Fri, 09 Sep 2016 11:44:38 -0700 (PDT) Received: from [192.168.223.2] (ec2-52-59-250-132.eu-central-1.compute.amazonaws.com. [52.59.250.132]) by smtp.gmail.com with ESMTPSA id d8sm4675787wmi.0.2016.09.09.11.44.36 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 09 Sep 2016 11:44:37 -0700 (PDT) From: Pat Ferrel Content-Type: multipart/alternative; boundary="Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Multitenancy on the Universal Product Recommender Date: Fri, 9 Sep 2016 11:44:34 -0700 References: <3E10BD45-295E-4852-A795-556AA7675A50@occamsmachete.com> To: user@predictionio.incubator.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3124) archived-at: Fri, 09 Sep 2016 18:44:44 -0000 --Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 As I said below, best to send me a private message, the feature is not = in the Apache version. Or make a feature request by creating a Jira for = PIO. On Sep 9, 2016, at 10:03 AM, Dipen Patel wrote: Could you please provide links to resources on the PIO that supports = multi-tenancy with lightweight Actors one per tenant.=20 On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel > wrote: I=E2=80=99m the maintainer of the Universal Recommender. We have OSS = support at https://groups.google.com/forum/#!forum/actionml-user = Do you wish to take advantage of the same user being in multiple = datasets/tenants? The answer below is assuming no. There are several ways to do this. First the PIO EventServer is = multi-tenant, just keep data in separate =E2=80=9Capps=E2=80=9D which = really should be named =E2=80=9Cdatasets=E2=80=9D they are IDed by keys = generated when you do `pio app new The PredictionServer is not multi-tenant but you can put a separate = process on different ports. You would train each tenant from a different = directory containing the UR and the correct engine.json for that = tenant/dataset. Then deploy it on some port that is specific to the = tenant/model. This will create somewhat heavyweight processes for each = port. We have a version of PIO that supports multi-tenancy with lightweight = Actors one per tenant. You deploy with a resource-id and when you make = queries include the REST resource id in the URI. All engines are on the = same port running in the same process so it=E2=80=99s very light-weight = and performant. Otherwise the query works the same. Private message me = to hear more. I would not advise the item property method, unless you know there is no = overlap in user-ids it may produce undesired results in the model and = these may leak into recommendations. You can solve that with a filter = (instead of the boost below) but there are better ways to solve this. On Sep 8, 2016, at 4:08 PM, David Jones > wrote: Hi All, I have a use case where I have events coming in from many seperate = tenants and I want to use the Universal Product Recommender engine. The = challenge is separating data from each tenant throughout the PIO = process. I can think of three possible ways to solve this issue, but they all = have tradeoffs: 1) Create Multiple Apps You have one app per tenant. When you create events, you use the access = key specific to that tenant. Then you query for recommendations using = that same access key to get recommendations for just that app. Issue: each engine has to specify an =E2=80=9CappName=E2=80=9D in = engine.json. So now you have to have an engine per tenant (AKA app) that = has all the same source code except for the =E2=80=9CappName=E2=80=9D = will be different. This=E2=80=99ll result in a bunch of duplicated code and you=E2=80=99ll = have to train and deploy each one individually. There is also no API for creating apps, so something will need to be = created to bridge that to allow a new tenant to be on boarded. 2) Use Channels You create one app, but create a channel per tenant. When you create an = event you specific the channel. Issue: the Universal Recommender engine can be modified to look at data = for a single channel name but that name cannot be dynamically queried, = it=E2=80=99ll be hardcoded into DataSource.scala. So now you=E2=80=99re = in this same situation where you=E2=80=99ll need to create one engine = per tenant, where each engine has the exact same source code except a = one line change in the DataSource.scala file. 3) Use Product Properties Provided your user ids are unique over all tenants, you could set a = property on each product with a tenant id. This way you can use one app, one engine, and simply query for = recommendations and supply a significant bias to products that contain = the tenant id property. Example, give me the top recommendations for user xyz who is on = tenant_id 12. { "user": "xyz", "fields": [ { "name": tenant_id", "values": ["12"], "bias": 10 } ] } Issues: since all the data for all tenants is in one place, you=E2=80=99re= going to have to train over all tenant=E2=80=99s data each time. = There=E2=80=99s also issues around risk of deleting data from the wrong = tenant should a tenant leave. - I was wondering if anyone has done something to any of these options? = Perhaps there are other options? Are there any better ones? I=E2=80=99m = thinking option 3) might be the best for our needs. Thanks, David. --Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 As I said below, best to send me a private message, the = feature is not in the Apache version. Or make a feature request by = creating a Jira for PIO.


On = Sep 9, 2016, at 10:03 AM, Dipen Patel <patelndipen@gmail.com> wrote:

Could you please provide links to resources on the PIO that = supports multi-tenancy with lightweight Actors one per tenant.

On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <pat@occamsmachete.com> = wrote:
I=E2=80=99m the maintainer of = the Universal Recommender. We have OSS support at https://groups.google.com/forum/#!forum/actionml-user

Do you wish to take advantage of the = same user being in multiple datasets/tenants? The answer below is = assuming no.

There are several ways to do this. First the PIO EventServer = is multi-tenant, just keep data in separate =E2=80=9Capps=E2=80=9D which = really should be named =E2=80=9Cdatasets=E2=80=9D they are IDed by keys = generated when you do `pio app new <your-app-name>

The PredictionServer is = not multi-tenant but you can put a separate process on different ports. = You would train each tenant from a different directory containing the UR = and the correct engine.json for that tenant/dataset. Then deploy it on = some port that is specific to the tenant/model. This will create = somewhat heavyweight processes for each port.

We have a version of PIO that supports = multi-tenancy with lightweight Actors one per tenant. You deploy with a = resource-id and when you make queries include the REST resource id in = the URI. All engines are on the same port running in the same process so = it=E2=80=99s very light-weight and performant. Otherwise the query works = the same. Private message me to hear more.

I would not advise the item property = method, unless you know there is no overlap in user-ids it may produce = undesired results in the model and these may leak into recommendations. = You can solve that with a filter (instead of the boost below) but there = are better ways to solve this.



On Sep 8, 2016, at 4:08 PM, = David Jones <dave@resolvedigital.com> = wrote:

Hi = All,

I have a use = case where I have events coming in from many seperate tenants and I want = to use the Universal Product Recommender engine. The challenge is = separating data from each tenant throughout the PIO process.

I can think of three = possible ways to solve this issue, but they all have = tradeoffs:

1) Create Multiple Apps

You have one app per tenant. When you = create events, you use the access key specific to that tenant. Then you = query for recommendations using that same access key to get = recommendations for just that app.

Issue: each engine has to specify an = =E2=80=9CappName=E2=80=9D in engine.json. So now you have to have an = engine per tenant (AKA app) that has all the same source code except for = the =E2=80=9CappName=E2=80=9D will be different.

This=E2=80=99ll result in a bunch of = duplicated code and you=E2=80=99ll have to train and deploy each one = individually.

There is also no API for creating apps, so something will = need to be created to bridge that to allow a new tenant to be on = boarded.

2) Use Channels

You create one app, but create a = channel per tenant. When you create an event you specific the = channel.

Issue: = the Universal Recommender engine can be modified to look at data for a = single channel name but that name cannot be dynamically queried, it=E2=80=99= ll be hardcoded into DataSource.scala. So now you=E2=80=99re in this = same situation where you=E2=80=99ll need to create one engine per = tenant, where each engine has the exact same source code except a one = line change in the DataSource.scala file.

3) Use Product = Properties

Provided your user ids are unique over all tenants, you could = set a property on each product with a tenant id.

This way you can use one app, one = engine, and simply query for recommendations and supply a significant = bias to products that contain the tenant id property.

Example, give me the top = recommendations for user xyz who is on tenant_id 12.

{
  "user": "xyz",
  "fields": [
    = {
      "name": = tenant_id",
      "values": = ["12"],
      "bias": 10
    }
  ]
}

Issues: since all the data for all tenants is in one place, = you=E2=80=99re going to have to train over all tenant=E2=80=99s data = each time. There=E2=80=99s also issues around risk of deleting data from = the wrong tenant should a tenant leave.

-
I was wondering = if anyone has done something to any of these options? Perhaps there are = other options? Are there any better ones? I=E2=80=99m thinking option 3) = might be the best for our needs.

Thanks,
David.



= --Apple-Mail=_68303A3F-2E3B-46A5-BBDF-24304EA17AB8--