mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Approaches for combining multiple types of item data for user-user similarity
Date Wed, 04 Jul 2012 07:42:16 GMT
The best default answer is to put them all in one model. The math
doesn't care what the things are. Unless you have a strong reason to
weight one data set I wouldn't. If you do, then two models is best. It
is hard to weight a subset of the data within most similarity
functions. I don't think it would in Pearson for instance but could
work in Tanimoto.

On Wed, Jul 4, 2012 at 1:20 AM, Ken Krugler <> wrote:
> Hi all,
> I'm curious what approaches are recommended for generating user-user similarity, when
I've got two (or more) distinct types of item data, both of which are fairly large.
> E.g. let's say I had a set of users where I knew both (a) what books they had bought
on Amazon, and (b) what YouTube videos they had watched.
> For each user, I want to find the 10 most similar other users.
>  - I could create two separate models, find the nearest 30 users for each user, and combine
(maybe with weighting) the results.
>  - I could toss all of the data into one model - and I could use a value of < 1.0
for whichever type of preference is less important.
> Any other suggestions? Input on the above two approaches?
> Thanks!
> -- Ken
> --------------------------
> Ken Krugler
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr

View raw message