mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Bryan <>
Subject Re: Turning Preference Files Into Vectors
Date Sat, 06 Feb 2010 20:05:44 GMT
Worked perfectly!

Thanks, and thanks for the active user community...hopefully I'll soon
get to the point that I'm comfortable contributing code as you


On Sat, Feb 6, 2010 at 12:21 PM, Sean <> wrote:
> I can point you to 90% of what you need in the existing code. Look at
> package first.
> RecommenderJob runs several MRs to make recommendations, and along the
> way does what you want -- almost. It outputs user vectors -- for each
> user, a vector with item IDs as indices and pref values as
> coordinates. You want the transpose of that -- for each item, a vector
> with user IDs as indices, etc.
> We can't use IDs in the recommender as indices directly, since IDs are
> longs, and vector dimensions are ints of course. So there's the first
> stage where we create a mapping from the real IDs to hashed indices.
> This is what ItemIDIndexMapper/Reducer do. You would just copy and
> tweak them to deal with user IDs.
> Then ToItemPrefsMapper/ToUserVectorReducer team up to write out the
> vectors. Same thing -- just an exercise in swapping user IDs and item
> IDs.
> The rest of the MRs don't matter to you. You could even copy
> RecommenderJob and cut out the other bits it runs, and have a
> ready-made driver.
> It's easier than it maybe sounds -- these are all quite small classes.
> If it works and you care to think through and contribute a clean
> refactoring that allows for generating item vectors as well as user
> vectors I'd commit that. But feel free to just hack for your own
> purpose too.
> Sean
> On Sat, Feb 6, 2010 at 4:07 PM, Matthew Bryan <> wrote:
>> Is there a straightforward way to take a preference file that's used
>> for a recommender (user_id, item_id, preference) and turn it into a
>> vector that can be used for clustering? As part of my evaluation of
>> Mahout I'd also like to cluster items and see how those simple
>> clusters perform.
>> Thanks!
>> Matthew Bryan

View raw message