mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: RecommenderJob Recommending an Item Already Preferred by a User
Date Wed, 07 Aug 2013 14:20:23 GMT
Hi Rafal,

this sounds really strange, the bug should not have anything to do with
the version of Hadoop that you are running. You could sometimes not see
it due to the random sampling of the preferences.

--sebastian

On 07.08.2013 13:53, Rafal Lukawiecki wrote:
> Sebastian,
> 
> I've been doing a little more digging regarding the issue of preferences being calculated
for already preferred items. I re-run the jobs using the same data and the same parameters
on a different installation of Hadoop, and the problem seems to have gone away. For now it
looks like the issue arises when I run it under Mahout 0.7 and 0.8 using HDP (Hortonworks
Data Platform) for Windows 1.1.0, with Hadoop 1.1.0. This problem does not show up, yet in
my tests, under Hadoop 1.2.1 compiled for OS X. I will work a little more to ensure my results,
but if they stood up, should I still report it as a Mahout issue?
> 
> Rafal  
> --
> Rafal Lukawiecki
> Strategic Consultant and Director 
> Project Botticelli Ltd
> 
> On 1 Aug 2013, at 17:31, Sebastian Schelter <ssc@apache.org> wrote:
> 
> Setting it to the maximum number should be enough. Would be great if you
> can share your dataset and tests.
> 
> 2013/8/1 Rafal Lukawiecki <rafal@projectbotticelli.com>
> 
>> Should I have set that parameter to a value much much larger than the
>> maximum number of actually expressed preferences by a user?
>>
>> I'm working on an anonymised data set. If it works as an error test case,
>> I'd be happy to share it for your re-test. I am still hoping it is my
>> error, not Mahout's.
>>
>> Rafal
>> --
>> Rafal Lukawiecki
>> Pardon brevity, mobile device.
>>
>> On 1 Aug 2013, at 17:19, "Sebastian Schelter" <ssc@apache.org> wrote:
>>
>>> Ok, please file a bug report detailing what you've tested and what
>> results
>>> you got.
>>>
>>> Just to clarify, setting maxPrefsPerUser to a high number still does not
>>> help? That surprises me.
>>>
>>>
>>> 2013/8/1 Rafal Lukawiecki <rafal@projectbotticelli.com>
>>>
>>>> Hi Sebastian,
>>>>
>>>> I've rechecked the results, and, I'm afraid that the issue has not gone
>>>> away, contrary to my yesterday's enthusiastic response. Using 0.8 I have
>>>> retested with and without --maxPrefsPerUser 9000 parameter (no user has
>>>> more than 5000 prefs). I have also supplied the prefs file, without the
>>>> preference value, that is as: user,item (one per line) as a
>> --filterFile,
>>>> with and without the -maxPrefsPerUser, and I am afraid we are also
>> seeing
>>>> recommendations for items the user has expressed a prior preference for.
>>>>
>>>> I suppose I need to file a bug report.
>>>>
>>>> Rafal
>>>> --
>>>> Rafal Lukawiecki
>>>> Pardon my brevity, sent from a telephone.
>>>>
>>>> On 31 Jul 2013, at 22:35, "Rafal Lukawiecki" <
>> rafal@projectbotticelli.com>
>>>> wrote:
>>>>
>>>>> Dear Sebastian,
>>>>>
>>>>> It looks like setting --maxPrefsPerUser 10000 have resolved the issue
>> in
>>>> our case—it seems that the most preferences a user had was just about
>> 5000,
>>>> so I doubled it just-in-case, but when I operationalise this model, I
>> will
>>>> make sure to calculate the actual max number of preferences and set the
>>>> parameter accordingly. I will double-check the resultset to make sure
>> the
>>>> issue is really gone, as I have only checked the few cases where we have
>>>> spotted a recommendation of a previously preferred item.
>>>>>
>>>>> Would you like me to file a bug, and would you like me to test it on
>> 0.8
>>>> or another version? I am using 0.7.
>>>>>
>>>>> Thanks for your kind support.
>>>>> Rafal
>>>>> --
>>>>> Rafal Lukawiecki
>>>>> Strategic Consultant and Director
>>>>> Project Botticelli Ltd
>>>>>
>>>>> On 31 Jul 2013, at 06:22, Sebastian Schelter <ssc.open@googlemail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Rafal,
>>>>>
>>>>> can you try to set the option --maxPrefsPerUser to the maximum number
>> of
>>>>> interactions per user and see if you still get the error?
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>> On 30.07.2013 19:29, Rafal Lukawiecki wrote:
>>>>>> Thank you Sebastian. The data set is not that large, as we are running
>>>> tests on a subset. It is about 24k users, 40k items, the preference file
>>>> has 65k preferences as triples. This was using Similarity Cooccurrence.
>>>>>>
>>>>>> I can see if I could anonymise the data set to share if that would
be
>>>> helpful.
>>>>>>
>>>>>> Thanks for your kind help.
>>>>>>
>>>>>> Rafal
>>>>>> --
>>>>>> Rafal Lukawiecki
>>>>>> Pardon my brevity, sent from a telephone.
>>>>>>
>>>>>> On 30 Jul 2013, at 18:18, "Sebastian Schelter" <ssc@apache.org>
>> wrote:
>>>>>>
>>>>>>> Hi Rafal,
>>>>>>>
>>>>>>> can you issue a ticket for this problem at
>>>>>>> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests
>> that
>>>>>>> check whether this happens and currently they work fine. I can
only
>>>> imagine
>>>>>>> that the problem occurs in larger datasets where we sample the
data
>> in
>>>> some
>>>>>>> places. Can you describe a scenario/dataset where this happens?
>>>>>>>
>>>>>>> Best,
>>>>>>> Sebastian
>>>>>>>
>>>>>>> 2013/7/30 Rafal Lukawiecki <rafal@projectbotticelli.com>
>>>>>>>
>>>>>>>> I'm new here, just registered. Many thanks to everyone for
working
>> on
>>>> an
>>>>>>>> amazing piece of software, thank you for building Mahout
and for
>> your
>>>>>>>> support. My apologies if this is not the right place to ask
the
>>>> question—I
>>>>>>>> have searched for the issue, and I can see this problem has
been
>>>> reported
>>>>>>>> here:
>>>>
>> http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
>>>>>>>>
>>>>>>>> Unfortunately, the trail leads to the newsgroups, and I have
not
>>>> found a
>>>>>>>> way, yet, to get an answer from them, without asking you.
>>>>>>>>
>>>>>>>> Essentially, I am running a Hadoop RecommenderJob from Mahout
0.7,
>>>> and I
>>>>>>>> am finding that it is recommending items that the user has
already
>>>>>>>> expressed a preference for in their input file. I understand
that
>> this
>>>>>>>> should not be happening, and I am not sure if there is a
know fix or
>>>> if I
>>>>>>>> should be looking for a workaround (such as using the entire
input
>> as
>>>> the
>>>>>>>> filterFile).
>>>>>>>>
>>>>>>>> I will double-check that there is no error on my side, but
so far it
>>>> does
>>>>>>>> not seem that way.
>>>>>>>>
>>>>>>>> Many thanks and my regards from Ireland,
>>>>>>>> Rafal Lukawiecki
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Rafal Lukawiecki
>>>>>>>>
>>>>>>>> Strategic Consultant and Director
>>>>>>>>
>>>>>>>> Project Botticelli Ltd
>>>>
>>
> 
> 


Mime
View raw message