Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1F5AC200C80 for ; Wed, 10 May 2017 22:12:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1E210160B9C; Wed, 10 May 2017 20:12:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E54BF160B99 for ; Wed, 10 May 2017 22:12:41 +0200 (CEST) Received: (qmail 81171 invoked by uid 500); 10 May 2017 20:12:41 -0000 Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.incubator.apache.org Delivered-To: mailing list user@predictionio.incubator.apache.org Received: (qmail 81159 invoked by uid 99); 10 May 2017 20:12:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 May 2017 20:12:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B1B281A0117 for ; Wed, 10 May 2017 20:12:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.295 X-Spam-Level: X-Spam-Status: No, score=-0.295 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id HkMeHKO_NpWw for ; Wed, 10 May 2017 20:12:38 +0000 (UTC) Received: from mail-pf0-f194.google.com (mail-pf0-f194.google.com [209.85.192.194]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E58C05FC4D for ; Wed, 10 May 2017 20:12:37 +0000 (UTC) Received: by mail-pf0-f194.google.com with SMTP id u26so641670pfd.2 for ; Wed, 10 May 2017 13:12:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=bzWdW7i8rvzjRfc/Sqstdb9R0r+2IYNaMjDsN98ENaI=; b=CDjFxT+BWFtJOMBSklYlrIOJD3mm7WJ0YI21IH2yVZ19zp8RBYkAmElMbzS/bjrZ44 IPGmd7N1boGyiCGBTcVBwjvbexzMADuF89ww8F7oH3IVmNM0+0HBeHh1eMkjbtRejVUk RnimQWdiX65A1jKQ1NNH0np3LYlGc7lnA/P70myhwaBIrbnXmsZQ5MrScRg/fQd3Vm+H WGDQwHRNiZ8NGPfx7WKz1rqpPUciY+nkiKdthzEu26Wls5P7WUIShSGDEcMMs7j7HaMj Qz7NJVwEqdfPeRouuG6WXMthMyg42tI5PVYQv4rSM+UAgYUsGrbOVW+WxCoMsixm8fDi fBcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=bzWdW7i8rvzjRfc/Sqstdb9R0r+2IYNaMjDsN98ENaI=; b=m1QFFMG/fHPcdzCbITPUkyg5QYukbDkwsTbC/CTZA9k1LKMoIGTYroIoCrI2UyWAYx hHmruhFrgaNL8K2/NOpWciz5IYK/pgmi3ZX+M0JTRvnap61eG2ZKM1/9Gdvx3oCSQ8LY YtXOtkkSHAybCBXB4hflIgS2z8WwimiKUkKzJy55q/Hir/5vWZWYkmQt+udbhMVX0FgT 0e0XK+qwUf1omwJxRsd+CEM0WmHZDhlq8BcaTqij+eAvXT0VuIcd5b16dQZfE4qndhFX +j5exXzaa+3b75RxyY0q35o1agAiOfMk5sPM1hcsmSW5ZTvZwZNNrYaV/qNaKQJL6xO5 nY6g== X-Gm-Message-State: AODbwcDa1kba2LXQFqXTtqYH/ZeiZYwx3DZhaQK8GDaZDYs1Wc4ih+ps vxfkN5sVKcUfGQ== X-Received: by 10.99.153.9 with SMTP id d9mr8434982pge.214.1494447156832; Wed, 10 May 2017 13:12:36 -0700 (PDT) Received: from [192.168.0.6] (c-24-18-213-211.hsd1.wa.comcast.net. [24.18.213.211]) by smtp.gmail.com with ESMTPSA id s17sm579639pfk.112.2017.05.10.13.12.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 May 2017 13:12:36 -0700 (PDT) From: Pat Ferrel Message-Id: <1D6E3FC2-483B-4258-B251-52F7E6CB3E9C@occamsmachete.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Docs Universal Recommender Date: Wed, 10 May 2017 13:12:35 -0700 In-Reply-To: Cc: user@predictionio.incubator.apache.org, actionml-user To: Marius Rabenarivo References: <1CF7DB58-1C00-4EC3-A93B-1669F6ECBB0D@occamsmachete.com> <5A06A910-8D6B-4CCE-8B60-A2B118891B3B@occamsmachete.com> X-Mailer: Apple Mail (2.3273) archived-at: Wed, 10 May 2017 20:12:43 -0000 --Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 What are your items? How much text? What other content? Unless you are = recommending long for blogs or news NLP won=E2=80=99t give you much = except maybe word2vec, which, if it has a good model, will give better = than bag-of-words. On May 10, 2017, at 1:05 PM, Marius Rabenarivo = wrote: So in you opinion, do you think that the NLP task should be done in the = Engine part using a library like mallet or should be implemented in = algorithm focused library : mahout? 2017-05-10 23:52 GMT+04:00 Pat Ferrel >: That is how to make personalized content-based recommendations.You=E2=80=99= d have to input content by attaching it to items and recording it = separately as a usage event per content bit. The input , for instance = would be every term in the description of an item the user purchased. = The input would be huge and the current UR + PIO is not optimized for = that kind of input. It is not a recommended mode to use the UR and is of = dubious value without NLP techniques such as word2vec or NER instead of = bag-of-word type content. It might be ok if you have rich metadata like = categories or tags. In general content based recommendations are often little better than = some filtering of popular or rotating promoted items (with no purchase = history), both can be done fairly easily with the UR.=20 Content based with NLP techniques for short lived items like news can = work well but require extra phases in from of the recommender to do the = NLP. On May 10, 2017, at 12:33 PM, Marius Rabenarivo = > wrote: Hello, So to what does the matrix T and vector h_t in this slide match to? : = https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7KB= 8GAYNtNPY/edit#slide=3Did.gf4d43b9e8_1_24 = 2017-05-10 21:10 GMT+04:00 Pat Ferrel >: Content based recommendations are based on, well, content. You can = really only make recs if you have an example item as with the = recommendations you see at the bottom of product page on Amazon. For this make sure t have lots of properties of items, even keywords = from descriptions will work, but also categories, tags, brands, price = ranges. etc. These all must be encoded as JSON arrays of strings so = prices might be one of [=E2=80=9C$0-$1=E2=80=9D, =E2=80=9C$1-$5=E2=80=9D, = =E2=80=A6] other things like descriptions categories or tags can have = several strings attached.=20 Then issue an item-based query with itemBias set higher (>1) to make use = of usage information first before content since it performs better. Then = add query fields for the various properties but include the values of = the item referenced in the =E2=80=9Citem=E2=80=9D field.=20 You will get similar items based on usage data unless there is none then = content will take over to recommend things with similar content. Play = with the itemBias, try >1 by varying amounts since you want usage based = similarity over content most of the time you have usage based data in = the model. There is no hard rule for the bias. =20 On May 10, 2017, at 6:36 AM, Dennis Honders > wrote: According to the docs, the UR is considered as hybrid collaborative = filtering / content-based filtering.=20 In my case I have a purchase history. Quite a lot of products are never = bought so traditional techniques won't be able to make recommendations. = For those products (never bought/sold), will recommendations be made = with content-based filtering techniques? If so, what techniques are used in UR? 2017-05-08 19:02 GMT+02:00 Pat Ferrel >: yes to all for UR v0.5.0 UR v0.6.0 is sitting in the `develop` branch waiting for one more minor = fix to be released. It uses the latest release of Mahout 0.13.0 so no = need to build it for the project. Several new features too. I expect it = to be out this week. On May 8, 2017, at 3:07 AM, Dennis Honders > wrote: Hi,=20 Are the following docs up-to-date? PredictionIO: http://actionml.com/docs/pio_quickstart = .=20 Is version 0.11.0 suitable for UR? The UR: http://actionml.com/docs/ur .=20 Is 0.5.0 the latest version?=20 Is Mahout still necessary? Thanks, Dennis --=20 You received this message because you are subscribed to the Google = Groups "actionml-user" group. To unsubscribe from this group and stop receiving emails from it, send = an email to actionml-user+unsubscribe@googlegroups.com = . To post to this group, send email to actionml-user@googlegroups.com = . To view this discussion on the web visit = https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nzmAPk4%2BD4G= M6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com = . For more options, visit https://groups.google.com/d/optout = . --Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 What are your items? How much text? What other content? = Unless you are recommending long for blogs or news NLP won=E2=80=99t = give you much except maybe word2vec, which, if it has a good model, will = give better than bag-of-words.


On May 10, 2017, at 1:05 = PM, Marius Rabenarivo <mariusrabenarivo@gmail.com> wrote:

So in you opinion, do you think that the NLP task should be = done in the Engine part using a library like mallet or should be = implemented in algorithm focused library : mahout?

2017-05-10 23:52 GMT+04:00 Pat Ferrel <pat@occamsmachete.com>:
That is how to make = personalized content-based recommendations.You=E2=80=99d have to input = content by attaching it to items and recording it separately as a usage = event per content bit. The input , for instance would be every term in = the description of an item the user purchased. The input would be huge = and the current UR + PIO is not optimized for that kind of input. It is = not a recommended mode to use the UR and is of dubious value without NLP = techniques such as word2vec or NER instead of bag-of-word type content. = It might be ok if you have rich metadata like categories or tags.

In general content based = recommendations are often little better than some filtering of popular = or rotating promoted items (with no purchase history), both can be done = fairly easily with the UR. 

Content based with NLP techniques for = short lived items like news can work well but require extra phases in = from of the recommender to do the NLP.



On May 10, = 2017, at 12:33 PM, Marius Rabenarivo <mariusrabenarivo@gmail.com> wrote:

Hello,

2017-05-10= 21:10 GMT+04:00 Pat Ferrel <pat@occamsmachete.com>:
Content based recommendations are based on, well, content. = You can really only make recs if you have an example item as with the = recommendations you see at the bottom of product page on Amazon.

For this make sure t = have lots of properties of items, even keywords from descriptions will = work, but also categories, tags, brands, price ranges. etc. These all = must be encoded as JSON arrays of strings so prices might be one of = [=E2=80=9C$0-$1=E2=80=9D, =E2=80=9C$1-$5=E2=80=9D, =E2=80=A6] other = things like descriptions categories or tags can have several strings = attached. 

Then issue an item-based query with itemBias set higher = (>1) to make use of usage information first before content since it = performs better. Then add query fields for the various properties but = include the values of the item referenced in the =E2=80=9Citem=E2=80=9D = field. 

You= will get similar items based on usage data unless there is none then = content will take over to recommend things with similar content. Play = with the itemBias, try >1 by varying amounts since you want usage = based similarity over content most of the time you have usage based data = in the model. There is no hard rule for the bias.

  
On May 10, 2017, at 6:36 AM, Dennis Honders = <dennishonders@gmail.com> wrote:

According = to the docs, the UR is considered as hybrid collaborative filtering / = content-based filtering. 
In my case I have a purchase = history. Quite a lot of products are never bought so traditional = techniques won't be able to make recommendations. For those products = (never bought/sold), will recommendations be made with content-based = filtering techniques?
If so, what techniques are used = in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel <pat@occamsmachete.com>:
yes to all for UR v0.5.0

UR v0.6.0 is sitting in = the `develop` branch waiting for one more minor fix to be released. It = uses the latest release of Mahout 0.13.0 so no need to build it for the = project. Several new features too. I expect it to be out this = week.


On May 8, 2017, at 3:07 AM, = Dennis Honders <dennishonders@gmail.com> = wrote:

Hi, 

Are the following docs = up-to-date?

Is = version 0.11.0 suitable for UR?

Is 0.5.0 the latest = version? 
Is Mahout = still necessary?

Thanks,

Dennis







--
You received this message because you are subscribed to the Google = Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send = an email to actionml-user+unsubscribe@googlegroups.com.
To post to this group, send email to actionml-user@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nz= mAPk4%2BD4GM6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

= --Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A--