mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angus Macnab <angus.mac...@gmail.com>
Subject Re: Like/No Rating/Dislike Dataset Representation to Mahout
Date Sun, 11 Oct 2015 20:50:57 GMT
I just meant that you can encode the single categorical value that you have
as two separate binary values.  You can accomplish this by asking two
questions.  First you can ask: did user like the product?
 (liked=1,dislike=0,no rating=0).  Next you can ask did the user rate the
product? (liked=1,dislike=1,no rating=0).

If this is your original data:

Customer ID          Product ID           Rating
1                           1                        No Rating
2                           1                        Like
3                           2                        Dislike
4                           1                        Like
5                           2                        Like
6                           2                        Dislike
7                           1                        No Rating
8                           1                        No Rating
9                           2                        Dislike

You could encode it like this:

Customer ID          Product ID           Rating          Liked
 Rated
1                           1                        No Rating           0
                0
2                           1                        Like
 1                 1
3                           2                        Dislike
 0                 0
4                           1                        Like
 1                 1
5                           2                        Like
 1                 1
6                           2                        Dislike
 0                 1
7                           1                        No Rating           0
                0
8                           1                        No Rating           0
                0
9                           2                        Dislike
 0                 1

And train on this dataset:

Customer ID          Product ID               Liked          Rated
1                           1                               0
  0
2                           1                               1
  1
3                           2                               0
  0
4                           1                               1
  1
5                           2                               1
  1
6                           2                               0
  1
7                           1                               0
  0
8                           1                               0
  0
9                           2                               0
  1

There is no need to ask a third question (did the user dislike product?),
since the answer to this question can be linearly derived from the other
two fields i.e. linearly dependent.

Hope this helps to clarify things.

Thanks,

Angus

On Sun, Oct 11, 2015 at 1:24 PM, Shady Hanna <shadimamdouh117@gmail.com>
wrote:

> Thank you so much Angus for your help.
>
> I did not quite get it, so if I have the following data:
>
> Customer ID          Product ID           Rating
> 1                           1                        No Rating (0,1)
> 2                           1                        Like (1,0)
> 3                           2                        Dislike (0,0)
>
> If what I understood is correct, how can I represent it to Mahout, and is
> it going to be a boolean pref data model ?
>
> Thank you so much again,
> Best Regards,
> Shady
>
> On Sat, Oct 10, 2015 at 3:18 AM, Angus Macnab <angus.macnab@gmail.com>
> wrote:
>
>> Rather than try impose ordinality on your data, you can think of "like",
>> "dislike", "did not rate" as a categorical feature with a cardinality of
>> three, which can be encoded using two binary features.  All possibilities
>> are fine, but the most logical is probably: rated=(0,1) and liked=(0,1).
>>
>> So you just need to come up with the routine to encode these features.
>> Hope this helps!
>>
>> Best,
>>
>> Angus
>> --------------------------------------
>> Angus Macnab
>>
>> On Fri, Oct 9, 2015 at 3:54 PM, Shady Hanna <shadimamdouh117@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I have a data which is represented in like,user did not rate it, and
>>> dislike, and I am not sure how I can represent this data to Mahout User
>>> Based/Item Based Recommender System, and which user Similarity can be
>>> used
>>> for such dataset.
>>>
>>> Would you please advise ?
>>>
>>> Thank you,
>>> Best Regards,
>>> Shady
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message