mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Item recommendation w/o users or preferences
Date Wed, 15 Jan 2014 18:26:25 GMT
Haven’t read the whole thread but it sounds like you just need some simple start-here info…

To do collaborative filtering you must have user-id, item-id, action/weight.

For a minimal commerce CF cooccurrence recommender this is typically something like: 
user-id, item-id, 1=purchase

To use Mahout you will have to translate the ids into positive integers. Treat these like
keys to an item-id or user-id lookup. So input into Mahout will be:
user-id-key, item-id-key, 1

You can do CF with anonymous user-ids, meaning an individual took the action but you don’t
know who. However to use this data you will have to have some way of tying an id to a real
person. Using transactions ids as a proxy for user-ids will work in the training data but
once you want to make a recommendation you will have to know some real user history to allow
the recommender to compare it with transactions.

Then you calculate recommendations using some Mahout recommender. If you are using the hadoop
version the output will be a row per user-id-key that will contain some number of recommendation
item-id-keys and their recommendation weight for sorting purposes. You then write your own
retrieval code to get the recs for a given user-id-key, since they are all pre-calculated
and in a Sequence File. If you are using the in-memory recommender you can ask for recs for
a given user-id-key and get the list returned.

You can also use transaction data alone to make anonymous recommendations, but that is market
basket analysis. In that case you have:
transaction-id-key, item-id-key, 1

Then at recommendation time you have a list of items in a single basket. There are several
ways to get this to work so I’ll stop here unless it’s what you need, in which case let
us know.

On Jan 11, 2014, at 1:38 PM, Tim Smith <> wrote:

> Is it about how to arrange your data to use this computation?  The
> references below might help with that.

Yes, I read and tried the recommendation examples from MIA and there is a mention of item
to item similarity, but I am not sure what form the file should take.  The examples are along
the lines of  userid,itemid,value

In section 6.2 of MIA we are multiplying the Co-occur matrix X User 
preferences = Recommendations (top of page 97), so if I do not have preferences should
I just default them all to the same value?  Taken together with your previous comments, is
this how I should be preparing my data?

Raw Sample Data (format: Transaction|Item)
123|Sun Glasses
124|Sun Glasses
124|Sun Glass Case
125|Sun Glass Case
126|Sun Glasses
126|Glass Repair Kit
127|Glass Repair Kit

Are you suggesting that I just simply use (format:  userid|item|value)
123|Sun Glasses|1
124|Sun Glasses|1
124|Sun Glass Case|1
125|Sun Glass Case|1
126|Sun Glasses|1
126|Glass Repair Kit|1
127|Glass Repair Kit|1

> Is it regarding the specifics of how you do the computation?  I can help
> with that, but would need a pointer to the difficulty.

Not quite yet.  I am working through the intuition first, I'll fight through the math once,
if ever, the fog clears

View raw message