mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sun, 27 Jul 2014 18:31:30 GMT
Hi, nothing helps. I did remap natural user_id to sequential 1,2,3,4....
keys. I did the same for item ids.
Result is the same, like I didn't do any mapping


Command line arguments: {


--booleanData=[false],


--endPhase=[2147483647],


--input=[projPrefs],


--maxPrefs=[500],


--maxSimilaritiesPerItem=[100000],


--minPrefsPerUser=[0],


--output=[output],


--similarityClassname=[SIMILARITY_PEARSON_CORRELATION],


--startPhase=[0],


--tempDir=[temp],


--threshold=[0.91]}



USERS=4056935


NEGLECTED_OBSERVATIONS=1211304


ROWS=779547


USED_OBSERVATIONS=9369782



COOCCURRENCES=12326601

PRUNED_COOCCURRENCES=90241722 (*??? why so much ???*)



And on the last map-reduce job:


Map input records=689597


Map output records=3436



Reduce input records=3436

Reduce output records=*1718*




2014-07-27 15:29 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:

> Thank you! I could spend all my life trying to get result without knowing
> the requirements for input data.
>
> BTW:
> we used mahout 0.7-cdh-4.4...cdh
> 4.7 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> and did get results close to reality. We just provided long user_id,
> item_id and didn't do something special.
> Why did it work?
>
>
> 2014-07-27 5:18 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:
>
> Both those jobs require you create Mahout IDs for users and items. For
>> most Hadoop based Mahout jobs, taking either text input or sequence files,
>> the IDs must follow the rules mentioned below. There are a few exceptions
>> but none you are using. The Wiki was rewritten for 0.9 and so the ID
>> requirements may not be documented well. You can file a Jira so someone
>> documents this.
>>
>> BTW spark-itemsimilarity will take any IDs and can read any
>> text-delimited file format, unfortunately it’s not quite ready yet.
>>
>> On Jul 26, 2014, at 3:14 AM, Serega Sheypak <serega.sheypak@gmail.com>
>> wrote:
>>
>> Hm... rather confusing... You are talking about input for:
>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>> or
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>>
>> My target is to get item-item similarity. ItemSimilarityJob right now
>> returns few similarities.
>>
>> I'm readin this:
>> https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
>> and that:
>> https://mahout.apache.org/users/recommender/userbased-5-minutes.html
>>
>> I don't see there something about " Your IDs must be in the range from 0
>> to
>> the number of rows" for both items and users. Where does this requirement
>> come from?
>>
>>
>> 2014-07-25 23:57 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:
>>
>> > I think I did explain below. Your IDs must be in the range from 0 to the
>> > number of rows - 1 and the same for item IDs. This is done by taking
>> your
>> > application specific IDs and mapping them to sequential non-negative
>> > Integers. You need to maintain a mapping to/from Mahout IDs somewhere in
>> > your own code.
>> >
>> > For example imagine input of the form
>> > -92, abc, 1.0
>> > 75000x, jkl, 2.0
>> >
>> > Your first user ID is -92, give it Mahout ID = 0. For your next user ID
>> > 75000x give it Mahout ID = 1
>> > Your first item ID is abc, give it Mahout ID = 0. For your next item ID
>> > jkl give it Mahout ID = 1
>> > keep doing this the first time you see a unique id from your input. A
>> Map
>> > will do this for you.
>> >
>> > And so on. Then the input to Mahout would be:
>> > 0,0,1.0
>> > 1,1,2.0
>> >
>> > The output will have Mahout IDs too so you need to map recommendations
>> for
>> > Mahout User ID 0 back to your User ID of -92, and the same for all item
>> IDs.
>> >
>> >
>> > On Jul 25, 2014, at 11:55 AM, Serega Sheypak <serega.sheypak@gmail.com>
>> > wrote:
>> >
>> > I'm preparing data using apache hive: user_id:long, item_it:long,
>> > preference[1.0, 2.0]
>> > I don't understand "For most Mahout jobs you have to prepare you data to
>> > have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site
>> docs, I
>> > didn't find there something related to mahout ids.
>> > Please explain.
>> >
>> >
>> > 2014-07-25 22:39 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:
>> >
>> >> Sorry I haven’t read this thread carefully but it looks like you may be
>> >> using the wrong IDs.
>> >>
>> >> For most Mahout jobs you have to prepare you data to have Mahout IDs.
>> You
>> >> do this by looking at each datum and as you see a new unique
>> application
>> >> specific user or item ID you give it a Mahout ID starting from 0. So
>> > Mahout
>> >> ID can be thought of as row and column numbers in a matrix. The Mahout
>> > IDs
>> >> for rows will be 0 thru # of rows-1 same for columns.
>> >>
>> >> This always requires that you translate into Mahout IDs then after the
>> > job
>> >> is run translate back into your application IDs. You need a
>> > bi-directional
>> >> dictionary of some type. I use a HashBiMap from Guava.
>> >>
>> >> Also I’d avoid the threshold for now. If you get that wrong it will
>> mess
>> >> things up badly and is very hard to tune. It’s there for completeness
>> > but I
>> >> never use it.
>> >>
>> >>
>> >> On Jul 25, 2014, at 12:55 AM, Serega Sheypak <serega.sheypak@gmail.com
>> >
>> >> wrote:
>> >>
>> >> Hi, nothing helps...
>> >> I do use mahout 0.9 compiled for CDH 4.7
>> >> I do provide only positive values
>> >> I do use itemsimilarityJob and do get 2000 similarities for 1400 unique
>> >> items
>> >> Input data is:
>> >> 16*10^6 preferences
>> >> 4*10^6 users
>> >> 0.6*10^ items
>> >> I do use perason correlation and preferece vlaues are: 1.0 and 2.0
>> >>
>> >>
>> >> 2014-07-22 9:32 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
>> >>
>> >>> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening.
>> >>> Right now I don't see how can it help me. As far as I know the stuff
I
>> >> try
>> >>> to use is pretty old and stable.
>> >>> looks like I do apply it in a wrong way.
>> >>>
>> >>> There is an option for recommenditembased named "--threshold". I do
>> >>> provide data for recommenditembased with preference values in range
>> >>> [1.1..2.0].
>> >>> I set --threshold to 1.2
>> >>> --threshold is absolute and can be from [1.1 . .2+] or it's relative
>> and
>> >>> can be [0.0 .. 0.99999]?
>> >>>
>> >>>
>> >>> 2014-07-22 3:54 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
>> >>>
>> >>> That version is no longer supported.  You should upgrade to 0.9
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak <
>> >>>> serega.sheypak@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> 0.7-cdh4.7.0
>> >>>>> Anyway, recommenditembased does produce these catalogs:
>> >>>>>
>> >>>>> /recommenditembased/temp/maxValues.bin
>> >>>>> /recommenditembased/temp/norms.bin
>> >>>>> /recommenditembased/temp/numNonZeroEntries.bin
>> >>>>> /recommenditembased/temp/pairwiseSimilarity
>> >>>>> /recommenditembased/temp/partialMultiply
>> >>>>> /recommenditembased/temp/prePartialMultiply1
>> >>>>> /recommenditembased/temp/prePartialMultiply2
>> >>>>> /recommenditembased/temp/preparePreferenceMatrix
>> >>>>> /recommenditembased/temp/similarityMatrix
>> >>>>> /recommenditembased/temp/weights
>> >>>>>
>> >>>>> I suppose that "/recommenditembased/temp/similarityMatrix" is
the
>> > thing
>> >>>> In
>> >>>>> eed. Right now I try to read it using
>> >>>>>
>> >>>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING
>> >>>>> com.twitter.elephantbird.pig.load.SequenceFileLoader(
>> >>>>>  '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
>> >>>>>  '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
>> >>>>> )  as (intId: int, vector:tuple(cardinality:int,
>> >>>>> entries:bag{t:tuple(some_id:long, some_value:double)}));
>> >>>>>
>> >>>>>
>> >>>>> Looks like the vector is empty... Or i do something wrong.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
>> >>>>>
>> >>>>>> Which version of Mahout?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <
>> >>>>> serega.sheypak@gmail.com
>> >>>>>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix
while
>> >>>>>> processing
>> >>>>>>> Job-Specific
>> >>>>>>>
>> >>>>>>> sudo -u hdfs hadoop fs -rm -r
>> >>>>>> hdfs://nameservice1/recommenditembased/output
>> >>>>>>> sudo -u hdfs hadoop fs -rm -r
>> >>>>> hdfs://nameservice1/recommenditembased/temp
>> >>>>>>> sudo -u oozie mahout recommenditembased \
>> >>>>>>>                  --input \
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >
>> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
>> >>>>>>> \
>> >>>>>>>                  --output \
>> >>>>>>>                  hdfs://nameservice1/recommenditembased/output
\
>> >>>>>>>                  --similarityClassname \
>> >>>>>>>                  SIMILARITY_LOGLIKELIHOOD \
>> >>>>>>>                 --numRecommendations \
>> >>>>>>>                  500 \
>> >>>>>>>                  --booleanData \
>> >>>>>>>                  false \
>> >>>>>>>                  --maxPrefsPerUser \
>> >>>>>>>                  1000 \
>> >>>>>>>                  --maxSimilaritiesPerItem \
>> >>>>>>>                  1000 \
>> >>>>>>>                  --minPrefsPerUser \
>> >>>>>>>                  5 \
>> >>>>>>>                  --maxPrefsPerUserInItemSimilarity \
>> >>>>>>>                  30 \
>> >>>>>>>                  --threshold \
>> >>>>>>>                 1.1 \
>> >>>>>>>                  --tempDir \
>> >>>>>>>                  hdfs://nameservice1/recommenditembased/temp
\
>> >>>>>>>                  --outputPathForSimilarityMatrix \
>> >>>>>>>
>> >>>> hdfs://nameservice1/recommenditembased/sim_matrix
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I'm on Cloudera cdh 4.7, looks like this feature is
not supported.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
>> >>>>>>>
>> >>>>>>>> Serega,
>> >>>>>>>>
>> >>>>>>>> See the last line on how to pass outputPathForSimilarityMatrix
>> >>>>> options
>> >>>>>> to
>> >>>>>>>> the recommenditembased command:
>> >>>>>>>>
>> >>>>>>>> sudo -u oozie mahout recommenditembased \
>> >>>>>>>>                 --input visited_items_with_inverted_items
\
>> >>>>>>>>
>> >>>>>>>>                 --output result \
>> >>>>>>>>                 --similarityClassname SIMILARITY_LOGLIKELIHOOD
>> >>>> \
>> >>>>>>>>                 --usersFile inverted_items \
>> >>>>>>>>                 --numRecommendations 500 \
>> >>>>>>>>                 --booleanData false \
>> >>>>>>>>                 --maxPrefsPerUser 100 \
>> >>>>>>>>                 --maxSimilaritiesPerItem 500 \
>> >>>>>>>>                 --minPrefsPerUser 0\
>> >>>>>>>>                 --maxPrefsPerUserInItemSimilarity
30 \
>> >>>>>>>>                 --threshold 0.91 \
>> >>>>>>>>                 --tempDir  temp \
>> >>>>>>>>                 --outputPathForSimilarityMatrix
>> >>>> similarityMatri \
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Peng Zhang
>> >>>>>>>> pzhang.xjtu@gmail.com
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <
>> >>>>> serega.sheypak@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I've inspected the code, our approach wouldn't
work with
>> >>>>>>>> booleanData=false.
>> >>>>>>>>> We do calcualte imte similarity in the wrong
way...(((
>> >>>>>>>>> Thank you
>> >>>>>>>>> 1. We provide "fake" user_id and provide --usersFile
in order to
>> >>>>> get
>> >>>>>>>>> recommendations for "fake user_id, where user_id
is a negative
>> >>>>>> item_id.
>> >>>>>>>> It
>> >>>>>>>>> worked when we did provide user_id->item_id
pairs without
>> >>>>> preference.
>> >>>>>>>>> 2. Our target is to get item similarities. We
tried
>> >>>>>>>>>
>> >>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>> >>>>>> but
>> >>>>>>>> it
>> >>>>>>>>> returns bad result comparing to RecommenderJob
with our "fake"
>> >>>>>> user_id
>> >>>>>>>>> (inverted item_id)
>> >>>>>>>>>
>> >>>>>>>>> 1. I'll try the option you provided.
>> >>>>>>>>> 2. I will remove input with fake user_id and
usersFile with
>> >>>> these
>> >>>>>> fake
>> >>>>>>>> ids
>> >>>>>>>>>
>> >>>>>>>>> 3.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >
>> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
>> >>>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix
>> >>>>>> option
>> >>>>>>> to
>> >>>>>>>>> RecommenderJob
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
>> >>>>>>>>>
>> >>>>>>>>>> Seraga,
>> >>>>>>>>>>
>> >>>>>>>>>> I have two comments:
>> >>>>>>>>>> 1. Don’t use negative user ids. Since
Mahout uses user id as
>> >>>> well
>> >>>>> as
>> >>>>>>>> item
>> >>>>>>>>>> id as the row/column index, you’d better
use 0, 1, 2, etc as
>> >>>> ids
>> >>>>>>>>>> 2. If you want to get the item similarity
information, you can
>> >>>> use
>> >>>>>>>>>> --outputPathForSimilarityMatrix in the command
>> >>>>>>>>>>
>> >>>>>>>>>> Regards,
>> >>>>>>>>>> Peng Zhang
>> >>>>>>>>>> M: +86 186-1658-7856
>> >>>>>>>>>> pzhang.xjtu@gmail.com
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak
<
>> >>>>>> serega.sheypak@gmail.com
>> >>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> All bad things happen here:
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Name
>> >>>>>>>>>>>
>> >>>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer
>> >>>>>>>>>>>
>> >>>>>>>>>>> User
>> >>>>>>>>>>>
>> >>>>>>>>>>> oozie
>> >>>>>>>>>>>
>> >>>>>>>>>>> Process User
>> >>>>>>>>>>>
>> >>>>>>>>>>> oozie
>> >>>>>>>>>>>
>> >>>>>>>>>>> Group
>> >>>>>>>>>>>
>> >>>>>>>>>>> oozie
>> >>>>>>>>>>>
>> >>>>>>>>>>> Mapper Class
>> >>>>>>>>>>>
>> >>>>>>>>>>> PartialMultiplyMapper
>> >>>>>>>>>>>
>> >>>>>>>>>>> Reducer Class
>> >>>>>>>>>>>
>> >>>>>>>>>>> AggregateAndRecommendReducer
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Job Input Directory
>> >>>>>>>>>>>
>> >>>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply
>> >>>>>>>>>>>
>> >>>>>>>>>>> Job Output Directory
>> >>>>>>>>>>>
>> >>>>>>>>>>> hdfs://nameservice1/itemrec/output/
>> >>>>>>>>>>>
>> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:
    Map input
>> >>>>>>> records=3312879
>> >>>>>>>>>>>
>> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:
    Map output
>> >>>>>>> records=3313251
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>> records=3313251
>> >>>>>>>>>>>
>> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:
    Reduce output
>> >>>>>> records=0
>> >>>>>>>>>>>
>> >>>>>>>>>>> Why does mahout returns 0 rows? it works
when booleanData=true
>> >>>>>>>>>> (preferences
>> >>>>>>>>>>> are ignored...?)
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak
<
>> >>>>>> serega.sheypak@gmail.com
>> >>>>>>>> :
>> >>>>>>>>>>>
>> >>>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
>> >>>>>>>>>>>> users_file:
>> >>>>>>>>>>>> --inverted_item_id
>> >>>>>>>>>>>> -1
>> >>>>>>>>>>>> -2
>> >>>>>>>>>>>> -3
>> >>>>>>>>>>>> -4
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> users_items_prefs
>> >>>>>>>>>>>> --inverted item_id
>> >>>>>>>>>>>> -1 1 1.0
>> >>>>>>>>>>>> -2 2 1.0
>> >>>>>>>>>>>> -3 3 1.0
>> >>>>>>>>>>>> -4 4 1.0
>> >>>>>>>>>>>> --user_id item_id pref_value
>> >>>>>>>>>>>> 11   1 1.6
>> >>>>>>>>>>>> 11   2 1.6
>> >>>>>>>>>>>> 123 3 2.0
>> >>>>>>>>>>>> 123 4 2.0
>> >>>>>>>>>>>> 333 1 2.0
>> >>>>>>>>>>>> 333 2 1.6
>> >>>>>>>>>>>> --e.t.c.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> if I set --booleanData true
>> >>>>>>>>>>>> then mahout returns the result.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew
Musselman <
>> >>>>>>>> andrew.musselman@gmail.com
>> >>>>>>>>>>> :
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'm confused about how you're constructing
the user file, and
>> >>>>> why
>> >>>>>>>> there
>> >>>>>>>>>>>>> are negated item ids here.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Can you post some more details
please, including Mahout
>> >>>> version
>> >>>>>> and
>> >>>>>>>>>> some
>> >>>>>>>>>>>>> sample data sets?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Jul 20, 2014, at 11:57
AM, Serega Sheypak <
>> >>>>>>>>>> serega.sheypak@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hi, I'm trying to create
item similarity.
>> >>>>>>>>>>>>>> I gather items which users
visit during shopping and then
>> >>>>>> create a
>> >>>>>>>>>> file:
>> >>>>>>>>>>>>>> user_id, item_id, weight
(where weight can be: [1.0, 1.6,
>> >>>>> 1.9],
>> >>>>>>>>>> depends
>> >>>>>>>>>>>>> on
>> >>>>>>>>>>>>>> user action type and data
source)
>> >>>>>>>>>>>>>> UNION
>> >>>>>>>>>>>>>> -item_id, item_id, 1.0 (from
items dictionary)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> and I do provide a userFile,
where user_id = -item_id
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The idea is to get item
similary. If any user visits item
>> >>>>> named
>> >>>>>>>> "A", i
>> >>>>>>>>>>>>> want
>> >>>>>>>>>>>>>> to show him items "B", "c",
"xxx" using preferences of
>> >>>> other
>> >>>>>>> users.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The problem is that the
last (???) mapreduce job returns 0
>> >>>>> rows:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Here are my settings:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> sudo -u oozie mahout recommenditembased
\
>> >>>>>>>>>>>>>>               --input visited_items_with_inverted_items
>> >>>> \
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>               --output result
\
>> >>>>>>>>>>>>>>               --similarityClassname
>> >>>>> SIMILARITY_LOGLIKELIHOOD
>> >>>>>> \
>> >>>>>>>>>>>>>>               --usersFile
inverted_items \
>> >>>>>>>>>>>>>>               --numRecommendations
500 \
>> >>>>>>>>>>>>>>               --booleanData
false \
>> >>>>>>>>>>>>>>               --maxPrefsPerUser
100 \
>> >>>>>>>>>>>>>>               --maxSimilaritiesPerItem
500 \
>> >>>>>>>>>>>>>>               --minPrefsPerUser
0\
>> >>>>>>>>>>>>>>               --maxPrefsPerUserInItemSimilarity
30 \
>> >>>>>>>>>>>>>>               --threshold
0.91 \
>> >>>>>>>>>>>>>>               --tempDir
 temp \
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Some counters... I don't
get what do they mean....
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
>> >>>>>>>>>>>>>>
>> >>>>>>>
>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
    USERS=7528530
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >
>> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>> >>>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>> >>>>>>>>>>>>> USER_RATINGS_USED=12,429,693
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
    ROWS=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>> >>>>>>> COOCCURRENCES=35882374
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>> >>>>>>> PRUNED_COOCCURRENCES=0
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map input
>> >>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map output
>> >>>>>>>>>> records=17570268
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=5221907
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce output
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map input
>> >>>>>>>> records=7528530
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map output
>> >>>>>>>>>> records=3313251
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=3313251
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce output
>> >>>>>>>>>>>>> records=3313251
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map input
>> >>>>>>>> records=6626130
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map output
>> >>>>>>>>>> records=6626130
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=6626130
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce output
>> >>>>>>>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map input
>> >>>>>>>> records=3312879
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map output
>> >>>>>>>>>> records=3313251
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce input
>> >>>>>>>>>>>>> records=3313251
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --------
>> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce output
>> >>>>>>> records=0
>> >>>>>>>>>>>>>> --------
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> why 0???
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message