mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Mahout - Recommenditemvalue with magnitude of 1
Date Tue, 24 Nov 2015 20:56:15 GMT


> On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <niklas.ekvall@gmail.com> wrote:
> 
> Okay!
> 
> No pre-filter and the user/item ids should start from 0 and go as many user
> and items there are. So, all the data we have should go into Mahout and we
> filter inside Mahout....correct?

Yes, but I wouldn't filter. The recs will very likely be better than random with only a small
number of events.

> 
> We do the same pre-filter for Spark item-similarity, is that wrong to?

No, spark-itemsimilarity uses string ids.

> 
> Best regards, Niklas
> 
> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com> wrote:
> 
>> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout need
>> to follow those rules.
>> 
>> The new recommender I mentioned has no such requirements, it uses string
>> IDs.
>> 
>> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>> wrote:
>> 
>> No, it does not start from 0 and does not cover all number between 0 and
>> the number of items/users. We do a prefiltering before (a user must have
>> bought at lest 5 product and a product must have been  bought by 3 users)
>> we use Mahout on the dataset. Therefore we start with user 3, then it jumps
>> to user 5, etc.
>> 
>> Is this wrong? Should we use all data as input to Mahout and do the
>> filtring inside Mahout?
>> 
>> We use the second latest version of Mahout!
>> 
>> Best regards, Niklas
>> 
>> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
>> <javascript:;>
>> <javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com <javascript:;>');>>
>> wrote:
>> 
>>> Do your ids start with 0 and cover all numbers between 0 and the number
>> of
>>> items -1 (same for user ids)?
>>> The old hadoop-mahout code required ordinal ids starting at 0
>>> 
>>> 
>>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>>
>>> wrote:
>>> 
>>> Hi Pat,
>>> 
>>> Here is some input:
>>> 
>>> 3       7414
>>> 3       12682
>>> 3       18947
>>> 3       19980
>>> 3       26975
>>> 3       54635
>>> 3       67789
>>> 3       73212
>>> 3       118932
>>> 3       138846
>>> 3       141268
>>> 5       3
>>> 5       2123
>>> 5       37955
>>> 5       39975
>>> 5       113289
>>> 6       3
>>> 6       456
>>> 6       2188
>>> 6       2496
>>> 6       6194
>>> 6       6361
>>> 6       6768
>>> 6       6919
>>> 6       6920
>>> 6       7257
>>> 6       7705
>>> 6       7706
>>> 6       11788
>>> 
>>> And some output:
>>> 
>>> 3
>>> 
>>> 
>> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
>>> 5
>>> 
>>> 
>> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
>>> 6
>>> 
>>> 
>> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
>>> 
>>> Best regards, Niklas
>>> 
>>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pat@occamsmachete.com
>> <javascript:;>>:
>>> 
>>>> Sounds like you may not have the input right. Recommendations should be
>>>> sorted by the strength and so shouldn’t all be 1 unless the data is very
>>>> odd.
>>>> 
>>>> Can you give us a small sample of the input?
>>>> 
>>>> 
>>>> BTW a newer recommender using Mahout’s Spark based code and a search
>>>> engine is here:
>>>> 
>>> 
>> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
>>>> a single machine install script is here:
>>> https://docs.prediction.io/start/
>>>> 
>>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>>
>>>> wrote:
>>>> 
>>>> Hello Mahout Users!
>>>> 
>>>> I use today Mahout - Recommenditembased with Log-similarity to produce
>>>> personal recommendations for Trigger Eamils in a offline mode. But when
>> I
>>>> produce e.g. 50 recommendations the rank value of the recommendations
>> are
>>>> always of magnitude 1. Why is this so? And, is the first recommendations
>>> in
>>>> this list the best one or is there some randomness in this list?
>>>> 
>>>> Best regards,
>>>> 
>>>> Niklas Ekvall
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 

Mime
View raw message