mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shady Hanna <shadimamdouh...@gmail.com>
Subject Re: Like/No Rating/Dislike Dataset Representation to Mahout
Date Sun, 11 Oct 2015 23:43:04 GMT
Now I got it, thank you so much again.

But is it possible to encode it like that in Mahout because as far as I
understand I can only use one field for the ratings...  In this case, I
would have to run Userbased recommender system for example twice, once with
the data represented as (rating 1, dislike 0, and rating 0) and once more
with this representation (rating 1, dislike and rating 0)...

Thank you,
Regards,
Shady




On Sun, Oct 11, 2015 at 10:50 PM, Angus Macnab <angus.macnab@gmail.com>
wrote:

> I just meant that you can encode the single categorical value that you
> have as two separate binary values.  You can accomplish this by asking two
> questions.  First you can ask: did user like the product?
>  (liked=1,dislike=0,no rating=0).  Next you can ask did the user rate the
> product? (liked=1,dislike=1,no rating=0).
>
> If this is your original data:
>
> Customer ID          Product ID           Rating
> 1                           1                        No Rating
> 2                           1                        Like
> 3                           2                        Dislike
> 4                           1                        Like
> 5                           2                        Like
> 6                           2                        Dislike
> 7                           1                        No Rating
> 8                           1                        No Rating
> 9                           2                        Dislike
>
> You could encode it like this:
>
> Customer ID          Product ID           Rating          Liked
>  Rated
> 1                           1                        No Rating           0
>                 0
> 2                           1                        Like
>    1                 1
> 3                           2                        Dislike
>  0                 0
> 4                           1                        Like
>    1                 1
> 5                           2                        Like
>    1                 1
> 6                           2                        Dislike
>  0                 1
> 7                           1                        No Rating           0
>                 0
> 8                           1                        No Rating           0
>                 0
> 9                           2                        Dislike
>  0                 1
>
> And train on this dataset:
>
> Customer ID          Product ID               Liked          Rated
> 1                           1                               0
>     0
> 2                           1                               1
>     1
> 3                           2                               0
>     0
> 4                           1                               1
>     1
> 5                           2                               1
>     1
> 6                           2                               0
>     1
> 7                           1                               0
>     0
> 8                           1                               0
>     0
> 9                           2                               0
>     1
>
> There is no need to ask a third question (did the user dislike product?),
> since the answer to this question can be linearly derived from the other
> two fields i.e. linearly dependent.
>
> Hope this helps to clarify things.
>
> Thanks,
>
> Angus
>
> On Sun, Oct 11, 2015 at 1:24 PM, Shady Hanna <shadimamdouh117@gmail.com>
> wrote:
>
>> Thank you so much Angus for your help.
>>
>> I did not quite get it, so if I have the following data:
>>
>> Customer ID          Product ID           Rating
>> 1                           1                        No Rating (0,1)
>> 2                           1                        Like (1,0)
>> 3                           2                        Dislike (0,0)
>>
>> If what I understood is correct, how can I represent it to Mahout, and is
>> it going to be a boolean pref data model ?
>>
>> Thank you so much again,
>> Best Regards,
>> Shady
>>
>> On Sat, Oct 10, 2015 at 3:18 AM, Angus Macnab <angus.macnab@gmail.com>
>> wrote:
>>
>>> Rather than try impose ordinality on your data, you can think of "like",
>>> "dislike", "did not rate" as a categorical feature with a cardinality of
>>> three, which can be encoded using two binary features.  All possibilities
>>> are fine, but the most logical is probably: rated=(0,1) and liked=(0,1).
>>>
>>> So you just need to come up with the routine to encode these features.
>>> Hope this helps!
>>>
>>> Best,
>>>
>>> Angus
>>> --------------------------------------
>>> Angus Macnab
>>>
>>> On Fri, Oct 9, 2015 at 3:54 PM, Shady Hanna <shadimamdouh117@gmail.com>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I have a data which is represented in like,user did not rate it, and
>>>> dislike, and I am not sure how I can represent this data to Mahout User
>>>> Based/Item Based Recommender System, and which user Similarity can be
>>>> used
>>>> for such dataset.
>>>>
>>>> Would you please advise ?
>>>>
>>>> Thank you,
>>>> Best Regards,
>>>> Shady
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message