Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@predictionio.incubator.apache.org
From: Pat Ferrel <pat@occamsmachete.com>
Message-Id: <1D6E3FC2-483B-4258-B251-52F7E6CB3E9C@occamsmachete.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Docs Universal Recommender
Date: Wed, 10 May 2017 13:12:35 -0700
In-Reply-To: <CAC-ATVGvbEM3nzmAPk4+D4GM6z1e1t9yJf4irR1kN1y5=Ak4Ag@mail.gmail.com>
Cc: user@predictionio.incubator.apache.org,
 actionml-user <actionml-user@googlegroups.com>
To: Marius Rabenarivo <mariusrabenarivo@gmail.com>
References: <CALfiJ8a=5RZ6HBOF_4z45u7CVFjsOcud6EhrmUbXY0YFKWO2LQ@mail.gmail.com>
 <1CF7DB58-1C00-4EC3-A93B-1669F6ECBB0D@occamsmachete.com>
 <CALfiJ8aSm=L=CF7ZoDuk=YWXD0onhqi1yf7fcgRcYggvLghtXg@mail.gmail.com>
 <ABE57669-3037-4828-9371-9176BD0D4D77@occamsmachete.com>
 <CAC-ATVHH6kJSpWmtNk1WXujeCXf2CRPXAeedFrUdBwEVZ=Z5ag@mail.gmail.com>
 <5A06A910-8D6B-4CCE-8B60-A2B118891B3B@occamsmachete.com>
 <CAC-ATVGvbEM3nzmAPk4+D4GM6z1e1t9yJf4irR1kN1y5=Ak4Ag@mail.gmail.com>
archived-at: Wed, 10 May 2017 20:12:43 -0000


--Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

What are your items? How much text? What other content? Unless you are =
recommending long for blogs or news NLP won=E2=80=99t give you much =
except maybe word2vec, which, if it has a good model, will give better =
than bag-of-words.


On May 10, 2017, at 1:05 PM, Marius Rabenarivo =
<mariusrabenarivo@gmail.com> wrote:

So in you opinion, do you think that the NLP task should be done in the =
Engine part using a library like mallet or should be implemented in =
algorithm focused library : mahout?

2017-05-10 23:52 GMT+04:00 Pat Ferrel <pat@occamsmachete.com =
<mailto:pat@occamsmachete.com>>:
That is how to make personalized content-based recommendations.You=E2=80=99=
d have to input content by attaching it to items and recording it =
separately as a usage event per content bit. The input , for instance =
would be every term in the description of an item the user purchased. =
The input would be huge and the current UR + PIO is not optimized for =
that kind of input. It is not a recommended mode to use the UR and is of =
dubious value without NLP techniques such as word2vec or NER instead of =
bag-of-word type content. It might be ok if you have rich metadata like =
categories or tags.

In general content based recommendations are often little better than =
some filtering of popular or rotating promoted items (with no purchase =
history), both can be done fairly easily with the UR.=20

Content based with NLP techniques for short lived items like news can =
work well but require extra phases in from of the recommender to do the =
NLP.


On May 10, 2017, at 12:33 PM, Marius Rabenarivo =
<mariusrabenarivo@gmail.com <mailto:mariusrabenarivo@gmail.com>> wrote:

Hello,

So to what does the matrix T and vector h_t in this slide match to? : =
https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7KB=
8GAYNtNPY/edit#slide=3Did.gf4d43b9e8_1_24 =
<https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7K=
B8GAYNtNPY/edit#slide=3Did.gf4d43b9e8_1_24>

2017-05-10 21:10 GMT+04:00 Pat Ferrel <pat@occamsmachete.com =
<mailto:pat@occamsmachete.com>>:
Content based recommendations are based on, well, content. You can =
really only make recs if you have an example item as with the =
recommendations you see at the bottom of product page on Amazon.

For this make sure t have lots of properties of items, even keywords =
from descriptions will work, but also categories, tags, brands, price =
ranges. etc. These all must be encoded as JSON arrays of strings so =
prices might be one of [=E2=80=9C$0-$1=E2=80=9D, =E2=80=9C$1-$5=E2=80=9D, =
=E2=80=A6] other things like descriptions categories or tags can have =
several strings attached.=20

Then issue an item-based query with itemBias set higher (>1) to make use =
of usage information first before content since it performs better. Then =
add query fields for the various properties but include the values of =
the item referenced in the =E2=80=9Citem=E2=80=9D field.=20

You will get similar items based on usage data unless there is none then =
content will take over to recommend things with similar content. Play =
with the itemBias, try >1 by varying amounts since you want usage based =
similarity over content most of the time you have usage based data in =
the model. There is no hard rule for the bias.

 =20
On May 10, 2017, at 6:36 AM, Dennis Honders <dennishonders@gmail.com =
<mailto:dennishonders@gmail.com>> wrote:

According to the docs, the UR is considered as hybrid collaborative =
filtering / content-based filtering.=20
In my case I have a purchase history. Quite a lot of products are never =
bought so traditional techniques won't be able to make recommendations. =
For those products (never bought/sold), will recommendations be made =
with content-based filtering techniques?
If so, what techniques are used in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel <pat@occamsmachete.com =
<mailto:pat@occamsmachete.com>>:
yes to all for UR v0.5.0

UR v0.6.0 is sitting in the `develop` branch waiting for one more minor =
fix to be released. It uses the latest release of Mahout 0.13.0 so no =
need to build it for the project. Several new features too. I expect it =
to be out this week.


On May 8, 2017, at 3:07 AM, Dennis Honders <dennishonders@gmail.com =
<mailto:dennishonders@gmail.com>> wrote:

Hi,=20

Are the following docs up-to-date?

PredictionIO: http://actionml.com/docs/pio_quickstart =
<http://actionml.com/docs/pio_quickstart>.=20
Is version 0.11.0 suitable for UR?

The UR: http://actionml.com/docs/ur <http://actionml.com/docs/ur>.=20
Is 0.5.0 the latest version?=20
Is Mahout still necessary?

Thanks,

Dennis


--=20
You received this message because you are subscribed to the Google =
Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send =
an email to actionml-user+unsubscribe@googlegroups.com =
<mailto:actionml-user+unsubscribe@googlegroups.com>.
To post to this group, send email to actionml-user@googlegroups.com =
<mailto:actionml-user@googlegroups.com>.
To view this discussion on the web visit =
https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nzmAPk4%2BD4G=
M6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com =
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nzmAPk4%2BD4=
GM6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com?utm_medium=3Demail&utm_sou=
rce=3Dfooter>.
For more options, visit https://groups.google.com/d/optout =
<https://groups.google.com/d/optout>.


--Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">What are your items? How much text? What other content? =
Unless you are recommending long for blogs or news NLP won=E2=80=99t =
give you much except maybe word2vec, which, if it has a good model, will =
give better than bag-of-words.<div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""><div><div class=3D"">On May 10, 2017, at 1:05 =
PM, Marius Rabenarivo &lt;<a href=3D"mailto:mariusrabenarivo@gmail.com" =
class=3D"">mariusrabenarivo@gmail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D"">So in you opinion, do you think that the NLP task should be =
done in the Engine part using a library like mallet or should be =
implemented in algorithm focused library : mahout?<br =
class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div =
class=3D"gmail_quote">2017-05-10 23:52 GMT+04:00 Pat Ferrel <span =
dir=3D"ltr" class=3D"">&lt;<a href=3D"mailto:pat@occamsmachete.com" =
target=3D"_blank" class=3D"">pat@occamsmachete.com</a>&gt;</span>:<br =
class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word" class=3D"">That is how to make =
personalized content-based recommendations.You=E2=80=99d have to input =
content by attaching it to items and recording it separately as a usage =
event per content bit. The input , for instance would be every term in =
the description of an item the user purchased. The input would be huge =
and the current UR + PIO is not optimized for that kind of input. It is =
not a recommended mode to use the UR and is of dubious value without NLP =
techniques such as word2vec or NER instead of bag-of-word type content. =
It might be ok if you have rich metadata like categories or tags.<div =
class=3D""><br class=3D""></div><div class=3D"">In general content based =
recommendations are often little better than some filtering of popular =
or rotating promoted items (with no purchase history), both can be done =
fairly easily with the UR.&nbsp;</div><div class=3D""><br =
class=3D""></div><div class=3D"">Content based with NLP techniques for =
short lived items like news can work well but require extra phases in =
from of the recommender to do the NLP.<div class=3D""><div =
class=3D"h5"><br class=3D""><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""><div class=3D""><div class=3D"">On May 10, =
2017, at 12:33 PM, Marius Rabenarivo &lt;<a =
href=3D"mailto:mariusrabenarivo@gmail.com" target=3D"_blank" =
class=3D"">mariusrabenarivo@gmail.com</a>&gt; wrote:</div><br =
class=3D"m_-1764335971667305326Apple-interchange-newline"><div =
class=3D""><div dir=3D"ltr" class=3D"">Hello,<br class=3D""><div =
class=3D""><br class=3D"">So to what does the matrix T and vector h_t in =
this slide match to? : <a =
href=3D"https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLe=
FRKSX7KB8GAYNtNPY/edit#slide=3Did.gf4d43b9e8_1_24" target=3D"_blank" =
class=3D"">https://docs.google.com/<wbr class=3D"">presentation/d/<wbr =
class=3D"">1MzIGFsATNeAYnLfoR6797ofcLeFRK<wbr =
class=3D"">SX7KB8GAYNtNPY/edit#slide=3Did.<wbr =
class=3D"">gf4d43b9e8_1_24</a><br class=3D""></div></div><div =
class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">2017-05-10=
 21:10 GMT+04:00 Pat Ferrel <span dir=3D"ltr" class=3D"">&lt;<a =
href=3D"mailto:pat@occamsmachete.com" target=3D"_blank" =
class=3D"">pat@occamsmachete.com</a>&gt;</span>:<br class=3D""><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div style=3D"word-wrap:break-word" =
class=3D"">Content based recommendations are based on, well, content. =
You can really only make recs if you have an example item as with the =
recommendations you see at the bottom of product page on Amazon.<div =
class=3D""><br class=3D""></div><div class=3D"">For this make sure t =
have lots of properties of items, even keywords from descriptions will =
work, but also categories, tags, brands, price ranges. etc. These all =
must be encoded as JSON arrays of strings so prices might be one of =
[=E2=80=9C$0-$1=E2=80=9D, =E2=80=9C$1-$5=E2=80=9D, =E2=80=A6] other =
things like descriptions categories or tags can have several strings =
attached.&nbsp;</div><div class=3D""><br class=3D""></div><div =
class=3D"">Then issue an item-based query with itemBias set higher =
(&gt;1) to make use of usage information first before content since it =
performs better. Then add query fields for the various properties but =
include the values of the item referenced in the =E2=80=9Citem=E2=80=9D =
field.&nbsp;</div><div class=3D""><br class=3D""></div><div class=3D"">You=
 will get similar items based on usage data unless there is none then =
content will take over to recommend things with similar content. Play =
with the itemBias, try &gt;1 by varying amounts since you want usage =
based similarity over content most of the time you have usage based data =
in the model. There is no hard rule for the bias.</div><div =
class=3D""><div class=3D"m_-1764335971667305326h5"><div class=3D""><br =
class=3D""></div><div class=3D"">&nbsp;&nbsp;<br class=3D""><div =
class=3D""><div class=3D"">On May 10, 2017, at 6:36 AM, Dennis Honders =
&lt;<a href=3D"mailto:dennishonders@gmail.com" target=3D"_blank" =
class=3D"">dennishonders@gmail.com</a>&gt; wrote:</div><br =
class=3D"m_-1764335971667305326m_3367837815988433114Apple-interchange-newl=
ine"><div class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif">According =
to the docs, the UR is considered as hybrid collaborative filtering / =
content-based filtering.&nbsp;</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">In my case I have a purchase =
history. Quite a lot of products are never bought so traditional =
techniques won't be able to make recommendations. For those products =
(never bought/sold), will recommendations be made with content-based =
filtering techniques?</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">If so, what techniques are used =
in UR?</div></div><div class=3D"gmail_extra"><br class=3D""><div =
class=3D"gmail_quote">2017-05-08 19:02 GMT+02:00 Pat Ferrel <span =
dir=3D"ltr" class=3D"">&lt;<a href=3D"mailto:pat@occamsmachete.com" =
target=3D"_blank" class=3D"">pat@occamsmachete.com</a>&gt;</span>:<br =
class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word" class=3D"">yes to all for UR v0.5.0<div =
class=3D""><br class=3D""></div><div class=3D"">UR v0.6.0 is sitting in =
the `develop` branch waiting for one more minor fix to be released. It =
uses the latest release of Mahout 0.13.0 so no need to build it for the =
project. Several new features too. I expect it to be out this =
week.</div><div class=3D""><div =
class=3D"m_-1764335971667305326m_3367837815988433114h5"><div =
class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div =
class=3D""><div class=3D""><div class=3D"">On May 8, 2017, at 3:07 AM, =
Dennis Honders &lt;<a href=3D"mailto:dennishonders@gmail.com" =
target=3D"_blank" class=3D"">dennishonders@gmail.com</a>&gt; =
wrote:</div><br =
class=3D"m_-1764335971667305326m_3367837815988433114m_-735734100654902922A=
pple-interchange-newline"><div class=3D""><div dir=3D"ltr" class=3D""><div=
 class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Hi,&nbsp;</div><div =
class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif"><br =
class=3D""></div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Are the following docs =
up-to-date?</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif"><br class=3D""></div><div =
class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">PredictionIO:&nbsp;<a =
href=3D"http://actionml.com/docs/pio_quickstart" target=3D"_blank" =
class=3D"">http://actionml.<wbr =
class=3D"">com/docs/pio_quickstart</a>.&nbsp;</div><div =
class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif">Is =
version 0.11.0 suitable for UR?</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif"><br class=3D""></div><div =
class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif">The =
UR:&nbsp;<a href=3D"http://actionml.com/docs/ur" target=3D"_blank" =
class=3D"">http://actionml.com/docs/u<wbr =
class=3D"">r</a>.&nbsp;</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Is 0.5.0 the latest =
version?&nbsp;</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Is Mahout =
still&nbsp;necessary?</div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif"><br class=3D""></div><div =
class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Thanks,</div><div =
class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif"><br =
class=3D""></div><div class=3D"gmail_default" =
style=3D"font-family:tahoma,sans-serif">Dennis</div></div>
</div></div><br class=3D""></div></div></div></div></blockquote></div><br =
class=3D""></div>
</div></div><br class=3D""></div></div></div></div></blockquote></div><br =
class=3D""></div>
</div></div><br =
class=3D""></div></div></div></div></div></blockquote></div><br =
class=3D""></div><div class=3D""><br =
class=3D"webkit-block-placeholder"></div>

-- <br class=3D"">
You received this message because you are subscribed to the Google =
Groups "actionml-user" group.<br class=3D"">
To unsubscribe from this group and stop receiving emails from it, send =
an email to <a href=3D"mailto:actionml-user+unsubscribe@googlegroups.com" =
class=3D"">actionml-user+unsubscribe@googlegroups.com</a>.<br class=3D"">
To post to this group, send email to <a =
href=3D"mailto:actionml-user@googlegroups.com" =
class=3D"">actionml-user@googlegroups.com</a>.<br class=3D"">
To view this discussion on the web visit <a =
href=3D"https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nzmAP=
k4%2BD4GM6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com?utm_medium=3Demail&=
amp;utm_source=3Dfooter" =
class=3D"">https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nz=
mAPk4%2BD4GM6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com</a>.<br =
class=3D"">
For more options, visit <a href=3D"https://groups.google.com/d/optout" =
class=3D"">https://groups.google.com/d/optout</a>.<br class=3D"">
</div></div><br class=3D""></div></body></html>=

--Apple-Mail=_9F55ADB7-9109-4206-ACDA-9BFB152F665A--