Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4970B200BD8 for ; Wed, 7 Dec 2016 23:09:28 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 47FAE160B0C; Wed, 7 Dec 2016 22:09:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EE6BB160AF9 for ; Wed, 7 Dec 2016 23:09:26 +0100 (CET) Received: (qmail 46570 invoked by uid 500); 7 Dec 2016 22:09:26 -0000 Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.incubator.apache.org Delivered-To: mailing list user@predictionio.incubator.apache.org Received: (qmail 46556 invoked by uid 99); 7 Dec 2016 22:09:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2016 22:09:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 84267181A6E for ; Wed, 7 Dec 2016 22:09:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.98 X-Spam-Level: * X-Spam-Status: No, score=1.98 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eeYPvUA5roHq for ; Wed, 7 Dec 2016 22:09:22 +0000 (UTC) Received: from mail-pg0-f46.google.com (mail-pg0-f46.google.com [74.125.83.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 32E7A5F4EC for ; Wed, 7 Dec 2016 22:09:22 +0000 (UTC) Received: by mail-pg0-f46.google.com with SMTP id 3so166357955pgd.0 for ; Wed, 07 Dec 2016 14:09:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=from:mime-version:subject:date:references:to:in-reply-to:message-id; bh=qKCp7u/URumGfHNEEb4F7H97smaUyhkIfCa0BkkkG3g=; b=eHVWmC+MFH1GGMdBJDNbOOfidlBto46M4Bfw+vcmIZciXzCdesosTLQ31hzYwssiu8 pqCOPIiOwwsUOdOvbcEZ9/bvJxHREn/082Iwb941Kzp3T6iOa/zFypKt541HpJ4sDgZo 7ZbE/LCATeIdXWGNhxWFp6c8VyR2cdxvrWBsASl5avyNZ9sR5A6pwAPZ0sJNwTC00Rvw 2iyR6pSq69ty6afoZiJcuSwDxLsucWISvHkp/iw7RFE6diB8axkxhe8h4I1XmKzCS4Ev irsUPGVbM+lGhEmEqzfzAyiVN7XJzOL4JLreG6dgN4Ui3GrW+xIAOGsqB0pd/ltvQPSc HR+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:mime-version:subject:date:references:to :in-reply-to:message-id; bh=qKCp7u/URumGfHNEEb4F7H97smaUyhkIfCa0BkkkG3g=; b=UwrEZqgOv6C40zgAHb1KG3zqpNLh2lBSuzSvJ3mBsH5nv0SKTsoXmOrsmE7MunfdOE atBioTbGuUBvtCwlAXY2PtKAUgS3BLJkvR/rLBvPzFN0wAzCMD1Y4ElaLjbYTjbgH/Ro mws5g6SOdtOuLgQNg1cCPvaK1btUr93Tpb7Ng4bgwoMkGvvVE2e6CpYbuK7xZ7CaLzJq 0jZauWqQO5sl3jq3qMCo3/MyiacpsnnZ10S2gcTbr6gvRQRXD07X6hle47RSNhqJh5sn Mzcv2Ah0XERrObIFgwnAXQFahjeIH8oXz1qPYvtXPTHCcqVLZutzd6kqq/FgrkWJ8kYR LNyg== X-Gm-Message-State: AKaTC00B/MytvEQxkYA6OBFt0M7mHzQK6QNuH6lvjXdTGxiXJTfHKKRotYVdcjku24roqQ== X-Received: by 10.98.223.25 with SMTP id u25mr70002873pfg.96.1481148560925; Wed, 07 Dec 2016 14:09:20 -0800 (PST) Received: from [192.168.0.10] (c-24-18-213-211.hsd1.wa.comcast.net. [24.18.213.211]) by smtp.gmail.com with ESMTPSA id n24sm44938156pfb.0.2016.12.07.14.09.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Dec 2016 14:09:19 -0800 (PST) From: Pat Ferrel Content-Type: multipart/alternative; boundary="Apple-Mail=_DBA2DBEC-490C-4C4B-831C-3CBFE83DF28F" Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: Tuning of Recommendation Engine Date: Wed, 7 Dec 2016 14:09:18 -0800 References: <69B75E28-9FB5-435C-8312-80D4038148AB@occamsmachete.com> To: user@predictionio.incubator.apache.org In-Reply-To: Message-Id: <62B044C4-144E-44E1-A448-4C832CF12C55@occamsmachete.com> X-Mailer: Apple Mail (2.3251) archived-at: Wed, 07 Dec 2016 22:09:28 -0000 --Apple-Mail=_DBA2DBEC-490C-4C4B-831C-3CBFE83DF28F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 No, we find the value of quantile LLR thresholds and use those = thresholds to calculate MAP. Then we look at MAP&* = number-of-people-that-get-recs to see if there is a max. This is = basically and analysis of precision vs recall. MAP will often increase = monotonically with higher thresholds until you get no recommendations at = all. Hyper-parameter search via trial and error. On Dec 4, 2016, at 9:15 PM, Gustavo Frederico = wrote: Pat, is this tool that finds the optimal LLR thresholds using MAP@K? Did = you model it as a regression problem?=20 Thanks Gustavo On Thu, Dec 1, 2016 at 2:48 PM, Pat Ferrel > wrote: This is a very odd statement. How many tuning knobs do you have with = MLlib=E2=80=99s ALS, 1, 2? There are a large number of tuning knobs for = the UR to fit different situations. What other recommender allows = multiple events as input? The UR also has business rules in the form of = filters and boosts on item properties. I think you may have missed a lot = in the docs, check some of the most important tuning here: = http://actionml.com/docs/ur_advanced_tuning = and the config params for = business rule here: http://actionml.com/docs/ur_config = =20 But changes must be based on either A/B tests or cross-validation. = Guessing at tuning is dangerous, intuition about how a big-data = algorithm works takes a long time to develop and the trade-offs may do = your business harm. We have a tool that find optimal LLR thresholds based on predictive = strength and sets the threshold per event pair. While you can set these = by hand the pattern we follow is called hyper-parameter search, which = finds optimal tuning for you.=20 On Dec 1, 2016, at 11:17 AM, Harsh Mathur > wrote: Hi Pat, I really appreciate the product, but our team was discussing about how = little control we have here. As in, say some recommendations got delivered to the user and we are = tracking conversions of course, so we can know if it's working or not. = Now, say if we see that conversions are low, as a developer I have very = little to experiment with here. I don't mean any disrespect. I have gone = through the code and have put in efforts to understand it too, the UR = is still better than the explicit or implicit templates as it has = filtration for properties, only thing lacking in my opinion is the = weightages. I read your ppt=20 Recommendations =3D PtP +PtV+... We were wondering if it could be Recommendations =3D a * PtP + b * PtV+ ... Where a and b are constants for tuning. In my understanding PtP is a = matrix so scalar multiplication should have be possible. Please correct = me if I am wrong. Also I was reading about log likelihood method, but I couldn't find a = proper explanation. I would be happy if anyone here can explain it in = more detail. Thanks in advance. Here is what I understood. For every item-item pair per expression (PtP, PtV), to calculate a = score, it will find 4 things: 1. No of users who posted both events for the pair, 2. No of users who posted event for one but not the other and vice = versa, 3. No of users who posted for neither Then a formula is applied taking the 4 params as input and a score is = returned. For each item and event pair you are storing top 20 items according to = score in elastic search. I didn't understand why the 2nd and third = parameters are taken, also if anyone can explain the correctness of the = method, That is why does it work rather how it works? Regards Harsh Mathur On Dec 1, 2016 11:01 PM, "Pat Ferrel" > wrote: Exactly so. The weighting of events is done by the algorithm. To add = biases would very likely be wrong and result in worse results. It is = therefore not supported in the current code. There may be a place for = this type of bias but it would have to be done in conjunction with a = cross-validation tests we have in our MAP test suite and it is not yet = supported. Best to leave them with the default weighting in the CCO = algorithm, which is based on the strength of correlation with the = conversion event, which I guess is purchase in your case. On Nov 28, 2016, at 2:19 PM, Magnus Kragelund > wrote: Hi, It's my understanding that you cannot apply a bias to the event, such as = "view" or "purchase" at query time. How the engine is using your = different events to calculate score, is something that is in part = defined by you and in part defined during training. In the engine.json config file you set an array of event names. The = first event in the array is considered a primary event, and will be the = event that the engine is trying to predict. The other events that you = might specify is secondary events, that the engine is allowed to take in = to consideration, when finding correlations to the primary event in your = data set. If no correlation is found for a given event, the event data = is not taken into account when predicting results.=20 Your array might look like this, when predicting purchases: ["purchase", = "initiated_payment", "view", "preview"] If you use the special $set event to add metadata to your items, you can = apply a bias or filter on those metadata properties at query time. /magnus From: Harsh Mathur > Sent: Monday, November 28, 2016 3:46:46 PM To: user@predictionio.incubator.apache.org = Subject: Tuning of Recommendation Engine =20 Hi, I have successfully deployed the UR template. Now I wanted to tune it a little bit, As of now I am sending 4 events, = purchase, view, initiated_payment and preview. Also our products have = categories, I am setting that as item properties. Now, as I query say: { "item": "{item_id}", "fields": [ { "name": "view", "bias": 0.5 }, { "name": "preview", "bias": 5 }, { "name": "purchase", "bias": 20 } ] } and query=20 { "item": "{item_id}" } For both queries, I get the same number of recommendations just the = score varies. The boosting isn't changing any recommendations, just = changing the scores. Is there any way in UR that we can give more = preference to some events, it will help give us more room to try and see = and make the recommendations more relevant to us. Regards Harsh Mathur harshmathur.1990@gmail.com =E2=80=9CPerseverance is the hard work you do after you get tired of = doing the hard work you already did." --Apple-Mail=_DBA2DBEC-490C-4C4B-831C-3CBFE83DF28F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 No, we find the value of quantile LLR thresholds and use = those thresholds to calculate MAP. Then we look at MAP&* = number-of-people-that-get-recs to see if there is a max. This is = basically and analysis of precision vs recall. MAP will often increase = monotonically with higher thresholds until you get no recommendations at = all. Hyper-parameter search via trial and error.


On = Dec 4, 2016, at 9:15 PM, Gustavo Frederico <gustavo.frederico@thinkwrap.com> wrote:

Pat, is this tool that finds = the optimal LLR thresholds using MAP@K? Did you model it as a regression = problem?

Thanks

Gustavo


On Thu, = Dec 1, 2016 at 2:48 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
This is a very odd statement. = How many tuning knobs do you have with MLlib=E2=80=99s ALS, 1, 2? There = are a large number of tuning knobs for the UR to fit different = situations. What other recommender allows multiple events as = input? The UR also has business rules in the form of filters and = boosts on item properties. I think you may have missed a lot in the = docs, check some of the most important tuning here: http://actionml.com/docs/ur_advanced_tuning and the config params for = business rule here: http://actionml.com/docs/ur_config 

But changes must be based on either A/B = tests or cross-validation. Guessing at tuning is dangerous, intuition = about how a big-data algorithm works takes a long time to develop and = the trade-offs may do your business harm.

We have a tool that find optimal LLR = thresholds based on predictive strength and sets the threshold per event = pair. While you can set these by hand the pattern we follow is called = hyper-parameter search, which finds optimal tuning for = you. 


On Dec 1, 2016, at 11:17 AM, = Harsh Mathur <harshmathur.1990@gmail.com> = wrote:

Hi Pat,
I really appreciate the product, but our team was discussing about how = little control we have here.
As in, say some recommendations got delivered to the user and we are = tracking conversions of course, so we can know if it's working or not. = Now, say if we see that conversions are low, as a developer I have very = little to experiment with here. I don't mean any disrespect. I have gone = through the code and have put in efforts to understand it too,  the = UR is still better than the explicit or implicit templates as it has = filtration for properties, only thing lacking in my opinion is the = weightages.

I read your ppt
Recommendations =3D PtP +PtV+...
We were wondering if it could be
Recommendations =3D a * PtP + b * PtV+ ...

Where a and b are constants for tuning. In my understanding = PtP is a matrix so scalar multiplication should have be possible. Please = correct me if I am wrong.

Also I was = reading about log likelihood method, but I couldn't find a proper = explanation. I would be happy if anyone here can explain it in more = detail. Thanks in advance.

Here is what I = understood.
For every item-item pair per expression (PtP, PtV), to calculate a = score, it will find 4 things:
1. No of users who posted both events for the pair,
2. No of users who posted event for one but not the other and vice = versa,
3. No of users who posted for neither

Then = a formula is applied taking the 4 params as input and a score is = returned.

For each item and event pair you = are storing top 20 items according to score in elastic search. I didn't = understand why the 2nd and third parameters are taken, also if anyone = can explain the correctness of the method, That is why does it work = rather how it works?

Regards
Harsh Mathur

On Dec 1, 2016 11:01 PM, "Pat Ferrel" <pat@occamsmachete.com> wrote:
Exactly so. The weighting of = events is done by the algorithm. To add biases would very likely be = wrong and result in worse results. It is therefore not supported in the = current code. There may be a place for this type of bias but it would = have to be done in conjunction with a cross-validation tests we have in = our MAP test suite and it is not yet supported. Best to leave them with = the default weighting in the CCO algorithm, which is based on the = strength of correlation with the conversion event, which I guess is = purchase in your case.


On Nov 28, = 2016, at 2:19 PM, Magnus Kragelund <mak@ida.dk> wrote:

Hi,
It's my understanding that you cannot apply a bias to = the event, such as "view" or "purchase" at query time. How the engine is = using your different events to calculate score, is something that is in = part defined by you and in part defined during training.
In the engine.json = config file you set an array of event names. The first event in the = array is considered a primary event, and will be the event that the = engine is trying to predict. The other events that you might specify is = secondary events, that the engine is allowed to take in to = consideration, when finding correlations to the primary event in your = data set. If no correlation is found for a given event, the = event data is not taken into account when predicting = results. 

Your array might = look like this, when predicting purchases: ["purchase", =  "initiated_payment", "view", "preview"]

If you use the special $set event to = add metadata to your items, you can apply a bias or filter on those = metadata properties at query time.
/magnus


From: Harsh Mathur <harshmathur.1990@gmail.com>
Sent: Monday, November 28, 2016 3:46:46 PM
To: user@predictionio.incubator.apache.org
Subject: Tuning of Recommendation Engine
 
Hi,
I have successfully deployed the UR = template.

Now = I wanted to tune it a little bit, As of now I am sending 4 events, = purchase, view, initiated_payment and preview. Also our products have = categories, I am setting that as item properties.
Now, as I query say:
{
"item": "{item_id}",
"fields": [
{
"name": "view",
"bias": 0.5
},
{
"name": "preview",
"bias": 5
},
{
"name": "purchase",
"bias": 20
}
]
}

and = query 
{
    =     "item": "{item_id}"
}


For both queries, I get the same number of recommendations = just the score varies. The boosting isn't changing any recommendations, = just changing the scores. Is there any way in UR that we can give more = preference to some events, it will help give us more room to try and see = and make the recommendations more relevant to us.
Regards
Harsh = Mathur
harshmathur.1990@gmail.com

=E2=80=9CPerseverance is the hard work you do = after you get tired of doing the hard work you already = did."




= --Apple-Mail=_DBA2DBEC-490C-4C4B-831C-3CBFE83DF28F--