Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 03E9F200D1A for ; Mon, 9 Oct 2017 19:43:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 025BF1609CE; Mon, 9 Oct 2017 17:43:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 20B2D1609B8 for ; Mon, 9 Oct 2017 19:43:02 +0200 (CEST) Received: (qmail 80352 invoked by uid 500); 9 Oct 2017 17:43:02 -0000 Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.incubator.apache.org Delivered-To: mailing list user@predictionio.incubator.apache.org Received: (qmail 80341 invoked by uid 99); 9 Oct 2017 17:43:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Oct 2017 17:43:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 603051A5071 for ; Mon, 9 Oct 2017 17:43:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.8 X-Spam-Level: X-Spam-Status: No, score=-0.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id oC48PQQk18wc for ; Mon, 9 Oct 2017 17:42:59 +0000 (UTC) Received: from mail-pf0-f176.google.com (mail-pf0-f176.google.com [209.85.192.176]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C160760D31 for ; Mon, 9 Oct 2017 17:42:58 +0000 (UTC) Received: by mail-pf0-f176.google.com with SMTP id p87so5423375pfj.3 for ; Mon, 09 Oct 2017 10:42:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=from:mime-version:subject:date:references:to:in-reply-to:message-id; bh=8XvcjnGoRKpE0++FtM2eAOgGWFU9SlRKTGrG2RkCInc=; b=skyt2xLGSON3v1Zuvmxr2fc/lN3OdEPTeH8ZdTtH4rOUAdV4Fyz15/Xv++Orhe0IEl GaNOovb/OpiifNV8OsqcxfTeWbqxLQOKaYzplN3eH1d5DQza4hR1+Oc+9KeEhUwG3hIz uYjYcBzPO7Irs9Mlp3d1/eRt1Sk2a0B3D20xjMjQxyBrvbig8s+JpJeqIq3zVLsqgP9t VDvo2wyQjpfbzfIovWVH+K7sOFV0eKwdrpgCS2MgAdhYWZcrWvc+0O+kWtbrlnNwJKyh dsjuWuwUCYz5Hz4GGxt5j78OvZ46vGU449HBDGuYC/xTTSnksIwW56xu3PdF2dF/hbOI rc0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:date:references:to :in-reply-to:message-id; bh=8XvcjnGoRKpE0++FtM2eAOgGWFU9SlRKTGrG2RkCInc=; b=cZZSDFw6Y0BVe4II4bzTDqkwUPxJYEtVVM5f30BERLhxyUqdpIrvdNkjownllDC/0S Q/pjbyIgKZ8VS2oIK960BAxpH8lj+gSnqW+5fHw6oFm5DUrHkJ3VrBF5e38nOZRPgHW4 VkPyI+9KjFSiSj1O9np3v5WbDry5aKRjQQW4AGT2Pk/oDUVjwlnKcQW+ELGoz4w3Ydyu JBpTLSWncH9frdMJ1C+1ozj0V0HGmi9wfjGhQ5JmAA4eM2vKe+oJxNabhYAkl1y98RAU A9aHpiabn/rGRaPKN1pk7gkShqNgQub47juSF/Q/Fmj+SGqhJfInLKj9e7Ry0Sikrx6f qIFA== X-Gm-Message-State: AMCzsaXmXJ7hzeCwjcdrYVE3kU+Y2DlpvorbUZAuCN577vV14NaAouug NhVMk34enw+1ADcxTVEJqgiEgMzkvoA= X-Google-Smtp-Source: AOwi7QAfYCiwkjoKISwtrtcuLm2XwA8huSrQwadPybmP72EzTPKfasR0LOGPS59rt4KmbEIhx2YIVg== X-Received: by 10.159.204.138 with SMTP id t10mr9612714plo.450.1507570977042; Mon, 09 Oct 2017 10:42:57 -0700 (PDT) Received: from [192.168.0.12] ([63.142.207.34]) by smtp.gmail.com with ESMTPSA id y10sm7427453pfl.186.2017.10.09.10.42.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 09 Oct 2017 10:42:56 -0700 (PDT) From: Pat Ferrel Content-Type: multipart/alternative; boundary="Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: PredictionIO Universal Recommender user rating Date: Mon, 9 Oct 2017 10:42:52 -0700 References: To: user@predictionio.incubator.apache.org, actionml-user In-Reply-To: Message-Id: X-Mailer: Apple Mail (2.3273) archived-at: Mon, 09 Oct 2017 17:43:04 -0000 --Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Yes, this is a very important point. We have found that the % of video = viewed is indeed a very important factor but rather than sending some = fraction to indicate the length viewed we have taken the approach before = to determine the % that indicates the user liked the video. This we do by triggering a =E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80= =9D, =E2=80=9Cview-95=E2=80=9D etc for different viewing times. We found = that for different content types there were different % of viewing that = best predicts what the user will like. We found that for =E2=80=9Cnewsy=E2= =80=9D videos =E2=80=9Cview-10=E2=80=9D was the best indicator. This = make sense because people often do not need all the details to = understand a videos content. But for movies a =E2=80=9Cview-10=E2=80=9D = indicated a dislike. The User started a movie, hated it and stopped it. = We used =E2=80=9Cview-95=E2=80=9D as the best indicator. 1) You know your content, do you think you have multiple types of = content like =E2=80=9Cnewsy=E2=80=9D and =E2=80=9Cstories/movies=E2=80=9D?= You may need different indicators of a user =E2=80=9Clike=E2=80=9D = corresponding to different % of watch based on the type 2) Gather the viewing experience as % and create categories like = =E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80=9D, =E2=80=9Cview-95=E2=80= =9D etc. Ingest each event for any given user. Run cross-validation = tests to see which gives the best results for each type on content you = have. If you have only one type you will find the best % to gather. 3) the problem with simply sending in the % is that for one type of = content 10% is a like (newsy) and for another type 10% alone is a = dislike (long-form movies) This leads us to using the categorical method = for defining indicators to give the best result instead of using the % = of video raw, which may yield confusing of wrong results. The extra step of testing the indicators in #2 can make a significant = difference in performance.=20 BTW if you are able to find an indicator of dislike, this may be useful = to predict likes: = https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occure= nces/ = On Oct 9, 2017, at 10:23 AM, Daniel Tirdea wrote: Hi,=20 I know there were a lot of question on this matter, I've looked = everywhere but didn't find a good answer. I'm using the Universal Recommender to make a recommendation system for = a video sharing website. I have a lot of details in terms of user behavior but the most important = one ( at least that's what I'm now ) is the amount of seconds consumed = by a visitor. A ration between the video length in seconds and the = seconds the visitor actually has seen from it. Let's say that a visitor reached a landing page with a video with total = length of 60 seconds. If the user actually sees 60 seconds ( the video = player reports that the video played the entire 60 seconds ) I think I = can assume that the visitor gave an implicit score of 10 out of 10 for = this video. Is there a way I can include this value in the prediction system ? Or, = order the returned items by this value? Thanks for reading this, any thought will be greatly appreciated. Thanks, Dan --Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Yes, this is a very important point. We have found that the % = of video viewed is indeed a very important factor but rather than = sending some fraction to indicate the length viewed we have taken the = approach before to determine the % that indicates the user liked the = video.

This we do by = triggering a =E2=80=9Cveiw-10=E2=80=9D, =E2=80=9Cview-25=E2=80=9D, = =E2=80=9Cview-95=E2=80=9D etc for different viewing times. We found that = for different content types there were different % of viewing that best = predicts what the user will like. We found that for =E2=80=9Cnewsy=E2=80=9D= videos =E2=80=9Cview-10=E2=80=9D was the best indicator. This make = sense because people often do not need all the details to understand a = videos content. But for movies a =E2=80=9Cview-10=E2=80=9D indicated a = dislike. The User started a movie, hated it and stopped it. We used = =E2=80=9Cview-95=E2=80=9D as the best indicator.

1) You know your content, do you think = you have multiple types of content like =E2=80=9Cnewsy=E2=80=9D and = =E2=80=9Cstories/movies=E2=80=9D? You may need different indicators of a = user =E2=80=9Clike=E2=80=9D corresponding to different % of watch based = on the type
2) Gather the viewing experience as % = and create categories like  =E2=80=9Cveiw-10=E2=80=9D, = =E2=80=9Cview-25=E2=80=9D, =E2=80=9Cview-95=E2=80=9D etc. Ingest each = event for any given user. Run cross-validation tests to see which gives = the best results for each type on content you have. If you have only one = type you will find the best % to gather.
3) the = problem with simply sending in the % is that for one type of content 10% = is a like (newsy) and for another type 10% alone is a dislike (long-form = movies) This leads us to using the categorical method for defining = indicators to give the best result instead of using the % of video raw, = which may yield confusing of wrong results.

The extra step of testing the = indicators in #2 can make a significant difference in = performance. 

BTW if you are able to find an indicator of dislike, this may = be useful  to predict likes: https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-c= ross-occurences/


On Oct 9, 2017, at 10:23 = AM, Daniel Tirdea <dan.tirdea@gmail.com> wrote:

Hi, 


I know there were a lot = of question on this matter, I've looked everywhere but didn't find a = good answer.

I'm= using the Universal Recommender to make a recommendation = system for a video sharing website.
I have a lot of = details in terms of user behavior but the most important one ( at least = that's what I'm now ) is the amount of seconds consumed by a visitor. A = ration between the video length in seconds and the seconds the visitor = actually has seen from it.

Let's say that a visitor reached a landing page with a video = with total length of 60 seconds. If the user actually sees 60 seconds ( = the video player reports that the video played the entire 60 seconds ) I = think I can assume that the visitor gave an implicit score of 10 out of = 10 for this video.

Is there a way I can include this value in the prediction = system ? Or, order the returned items by this value?

Thanks for reading this, = any thought will be greatly appreciated.


Thanks,
Dan

= --Apple-Mail=_3B47D25A-F639-47A4-8FCE-87DADB973DA6--