mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peng Zhang <pzhang.x...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Mon, 21 Jul 2014 00:58:27 GMT
Seraga,

I have two comments:
1. Don’t use negative user ids. Since Mahout uses user id as well as item id as the row/column
index, you’d better use 0, 1, 2, etc as ids
2. If you want to get the item similarity information, you can use --outputPathForSimilarityMatrix
in the command

Regards,
Peng Zhang
M: +86 186-1658-7856
pzhang.xjtu@gmail.com





On Jul 21, 2014, at 4:00 AM, Serega Sheypak <serega.sheypak@gmail.com> wrote:

> All bad things happen here:
> 
> 
> 
> Name
> 
> RecommenderJob-PartialMultiplyMapper-Reducer
> 
> User
> 
> oozie
> 
> Process User
> 
> oozie
> 
> Group
> 
> oozie
> 
> Mapper Class
> 
> PartialMultiplyMapper
> 
> Reducer Class
> 
> AggregateAndRecommendReducer
> 
> 
> Job Input Directory
> 
> hdfs://nameservice1/itemrec/temp/partialMultiply
> 
> Job Output Directory
> 
> hdfs://nameservice1/itemrec/output/
> 
> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input records=3312879
> 
> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output records=3313251
> 
> 
> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input records=3313251
> 
> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0
> 
> Why does mahout returns 0 rows? it works when booleanData=true (preferences
> are ignored...?)
> 
> 
> 
> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> 
>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
>> users_file:
>> --inverted_item_id
>> -1
>> -2
>> -3
>> -4
>> 
>> users_items_prefs
>> --inverted item_id
>> -1 1 1.0
>> -2 2 1.0
>> -3 3 1.0
>> -4 4 1.0
>> --user_id item_id pref_value
>> 11   1 1.6
>> 11   2 1.6
>> 123 3 2.0
>> 123 4 2.0
>> 333 1 2.0
>> 333 2 1.6
>> --e.t.c.
>> 
>> if I set --booleanData true
>> then mahout returns the result.
>> 
>> 
>> 
>> 
>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <andrew.musselman@gmail.com>:
>> 
>> I'm confused about how you're constructing the user file, and why there
>>> are negated item ids here.
>>> 
>>> Can you post some more details please, including Mahout version and some
>>> sample data sets?
>>> 
>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <serega.sheypak@gmail.com>
>>> wrote:
>>>> 
>>>> Hi, I'm trying to create item similarity.
>>>> I gather items which users visit during shopping and then create a file:
>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends
>>> on
>>>> user action type and data source)
>>>> UNION
>>>> -item_id, item_id, 1.0 (from items dictionary)
>>>> 
>>>> and I do provide a userFile, where user_id = -item_id
>>>> 
>>>> The idea is to get item similary. If any user visits item named "A", i
>>> want
>>>> to show him items "B", "c", "xxx" using preferences of other users.
>>>> 
>>>> The problem is that the last (???) mapreduce job returns 0 rows:
>>>> 
>>>> Here are my settings:
>>>> 
>>>> 
>>>> sudo -u oozie mahout recommenditembased \
>>>>                   --input visited_items_with_inverted_items \
>>>> 
>>>>                   --output result \
>>>>                   --similarityClassname SIMILARITY_LOGLIKELIHOOD \
>>>>                   --usersFile inverted_items \
>>>>                   --numRecommendations 500 \
>>>>                   --booleanData false \
>>>>                   --maxPrefsPerUser 100 \
>>>>                   --maxSimilaritiesPerItem 500 \
>>>>                   --minPrefsPerUser 0\
>>>>                   --maxPrefsPerUserInItemSimilarity 30 \
>>>>                   --threshold 0.91 \
>>>>                   --tempDir  temp \
>>>> 
>>>> Some counters... I don't get what do they mean....
>>>> 
>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>>>> 
>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
>>>> 
>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>> 
>>> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>>>> 
>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>   USER_RATINGS_NEGLECTED=1,798,738
>>>> 
>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>> USER_RATINGS_USED=12,429,693
>>>> 
>>>> 
>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
>>>> 
>>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>> 
>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
>>>> 
>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>> 
>>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>> 
>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
>>>> 
>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
>>>> 
>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
>>>> 
>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output records=17570268
>>>> 
>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
>>> records=5221907
>>>> 
>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
>>> records=3312879
>>>> 
>>>> 
>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>> records=3312879
>>>> 
>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>> records=3312879
>>>> 
>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>> records=3312879
>>>> 
>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>> records=3312879
>>>> 
>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
>>>> 
>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output records=3313251
>>>> 
>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
>>> records=3313251
>>>> 
>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
>>> records=3313251
>>>> 
>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
>>>> 
>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output records=6626130
>>>> 
>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
>>> records=6626130
>>>> 
>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
>>> records=3312879
>>>> 
>>>> 
>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
>>>> 
>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output records=3313251
>>>> 
>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
>>> records=3313251
>>>> 
>>>> --------
>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
>>>> --------
>>>> 
>>>> why 0???
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message