Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@predictionio.incubator.apache.org
From: Pat Ferrel <pat@occamsmachete.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: PredictionIO Universal Recommender user rating
Date: Mon, 9 Oct 2017 10:42:52 -0700
References: <CAMzemMUnQoW6s6ZENEnJv1X+zoDYuUS3odCiqGYnG8rXty82hw@mail.gmail.com>
To: user@predictionio.incubator.apache.org,
 actionml-user <actionml-user@googlegroups.com>
In-Reply-To: <CAMzemMUnQoW6s6ZENEnJv1X+zoDYuUS3odCiqGYnG8rXty82hw@mail.gmail.com>
Message-Id: <FE00A5A8-340C-4B82-9FE2-B72B4EECAF4E@occamsmachete.com>
archived-at: Mon, 09 Oct 2017 17:43:04 -0000


--Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Yes, this is a very important point. We have found that the % of video =
viewed is indeed a very important factor but rather than sending some =
fraction to indicate the length viewed we have taken the approach before =
to determine the % that indicates the user liked the video.

This we do by triggering a =E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80=
=9D, =E2=80=9Cview-95=E2=80=9D etc for different viewing times. We found =
that for different content types there were different % of viewing that =
best predicts what the user will like. We found that for =E2=80=9Cnewsy=E2=
=80=9D videos =E2=80=9Cview-10=E2=80=9D was the best indicator. This =
make sense because people often do not need all the details to =
understand a videos content. But for movies a =E2=80=9Cview-10=E2=80=9D =
indicated a dislike. The User started a movie, hated it and stopped it. =
We used =E2=80=9Cview-95=E2=80=9D as the best indicator.

1) You know your content, do you think you have multiple types of =
content like =E2=80=9Cnewsy=E2=80=9D and =E2=80=9Cstories/movies=E2=80=9D?=
 You may need different indicators of a user =E2=80=9Clike=E2=80=9D =
corresponding to different % of watch based on the type
2) Gather the viewing experience as % and create categories like  =
=E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80=9D, =E2=80=9Cview-95=E2=80=
=9D etc. Ingest each event for any given user. Run cross-validation =
tests to see which gives the best results for each type on content you =
have. If you have only one type you will find the best % to gather.
3) the problem with simply sending in the % is that for one type of =
content 10% is a like (newsy) and for another type 10% alone is a =
dislike (long-form movies) This leads us to using the categorical method =
for defining indicators to give the best result instead of using the % =
of video raw, which may yield confusing of wrong results.

The extra step of testing the indicators in #2 can make a significant =
difference in performance.=20

BTW if you are able to find an indicator of dislike, this may be useful  =
to predict likes: =
https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occure=
nces/ =
<https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occur=
ences/>


On Oct 9, 2017, at 10:23 AM, Daniel Tirdea <dan.tirdea@gmail.com> wrote:

Hi,=20


I know there were a lot of question on this matter, I've looked =
everywhere but didn't find a good answer.

I'm using the Universal Recommender to make a recommendation system for =
a video sharing website.
I have a lot of details in terms of user behavior but the most important =
one ( at least that's what I'm now ) is the amount of seconds consumed =
by a visitor. A ration between the video length in seconds and the =
seconds the visitor actually has seen from it.

Let's say that a visitor reached a landing page with a video with total =
length of 60 seconds. If the user actually sees 60 seconds ( the video =
player reports that the video played the entire 60 seconds ) I think I =
can assume that the visitor gave an implicit score of 10 out of 10 for =
this video.

Is there a way I can include this value in the prediction system ? Or, =
order the returned items by this value?

Thanks for reading this, any thought will be greatly appreciated.


Thanks,
Dan


--Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Yes, this is a very important point. We have found that the % =
of video viewed is indeed a very important factor but rather than =
sending some fraction to indicate the length viewed we have taken the =
approach before to determine the % that indicates the user liked the =
video.<div class=3D""><br class=3D""></div><div class=3D"">This we do by =
triggering a =E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80=9D, =
=E2=80=9Cview-95=E2=80=9D etc for different viewing times. We found that =
for different content types there were different % of viewing that best =
predicts what the user will like. We found that for =E2=80=9Cnewsy=E2=80=9D=
 videos =E2=80=9Cview-10=E2=80=9D was the best indicator. This make =
sense because people often do not need all the details to understand a =
videos content. But for movies a =E2=80=9Cview-10=E2=80=9D indicated a =
dislike. The User started a movie, hated it and stopped it. We used =
=E2=80=9Cview-95=E2=80=9D as the best indicator.</div><div class=3D""><br =
class=3D""></div><div class=3D"">1) You know your content, do you think =
you have multiple types of content like =E2=80=9Cnewsy=E2=80=9D and =
=E2=80=9Cstories/movies=E2=80=9D? You may need different indicators of a =
user =E2=80=9Clike=E2=80=9D corresponding to different % of watch based =
on the type</div><div class=3D"">2) Gather the viewing experience as % =
and create categories like&nbsp;&nbsp;=E2=80=9Cveiw-10=E2=80=9D, =
=E2=80=9Cview-25=E2=80=9D, =E2=80=9Cview-95=E2=80=9D etc. Ingest each =
event for any given user. Run cross-validation tests to see which gives =
the best results for each type on content you have. If you have only one =
type you will find the best % to gather.</div><div class=3D"">3) the =
problem with simply sending in the % is that for one type of content 10% =
is a like (newsy) and for another type 10% alone is a dislike (long-form =
movies) This leads us to using the categorical method for defining =
indicators to give the best result instead of using the % of video raw, =
which may yield confusing of wrong results.</div><div class=3D""><br =
class=3D""></div><div class=3D"">The extra step of testing the =
indicators in #2 can make a significant difference in =
performance.&nbsp;</div><div class=3D""><br class=3D""></div><div =
class=3D"">BTW if you are able to find an indicator of dislike, this may =
be useful &nbsp;to predict likes:&nbsp;<a =
href=3D"https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cros=
s-occurences/" =
class=3D"">https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-c=
ross-occurences/</a></div><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""><div><div class=3D"">On Oct 9, 2017, at 10:23 =
AM, Daniel Tirdea &lt;<a href=3D"mailto:dan.tirdea@gmail.com" =
class=3D"">dan.tirdea@gmail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D"">Hi,&nbsp;<div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D"">I know there were a lot =
of question on this matter, I've looked everywhere but didn't find a =
good answer.</div><div class=3D""><br class=3D""></div><div class=3D"">I'm=
 using the Universal&nbsp;Recommender to make a&nbsp;recommendation =
system for a video sharing website.</div><div class=3D"">I have a lot of =
details in terms of user behavior but the most important one ( at least =
that's what I'm now ) is the amount of seconds consumed by a visitor. A =
ration between the video length in seconds and the seconds the visitor =
actually has seen from it.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Let's say that a visitor reached a landing page with a video =
with total length of 60 seconds. If the user actually sees 60 seconds ( =
the video player reports that the video played the entire 60 seconds ) I =
think I can assume that the visitor gave an implicit score of 10 out of =
10 for this video.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Is there a way I can include this value in the prediction =
system ? Or, order the returned items by this value?</div><div =
class=3D""><br class=3D""></div><div class=3D"">Thanks for reading this, =
any thought will be greatly appreciated.</div><div class=3D""><br =
class=3D""></div><div class=3D""><br class=3D""></div><div =
class=3D"">Thanks,<br class=3D"">Dan</div></div>
</div></div><br class=3D""></div></body></html>=

--Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6--