mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edith Au <>
Subject Re: recommendation based on user preference
Date Thu, 10 Jul 2014 17:45:50 GMT
Thank you so much for the suggestions.  It took me sometime to figure
things out but I believe I have a pretty good grip on what's need to be
done now. My dataset is small enough to fit into a single machine so I am
going to use an in memory implementation rather than hadoop.   As suggested
by both Pat and Manuel, I have a table (in file system) with neighborhoods
as rows and amenities as columns.  In runtime, I will only load the columns
(amenities) correlate to a selected user and do a UserSimilarity operation
between each neighborhood and the one the user resides in.  After that, I
can pick up the NearestNUserNeighborhoods for results.

I gather UserSimilarity is the in-memory equivalent of RowSimilarity
(Hadoop) ?  It would be great if someone can confirm it!

Thanks again Pat and Manuel!

On Wed, Jul 2, 2014 at 4:06 PM, Pat Ferrel <> wrote:

> If you are looking to recommend a similar neighborhood based on the
> characteristics of some other neighborhood (the user’s current one) so you
> wouldn’t use collaborative filtering. This is a metadata recommender based
> on similarity of neighborhoods not a collection of user preferences.
> The easiest and fastest would be to use a search engine but I’ll leave
> that for now since it doesn’t account for feature weights as well.
> create a table like this:
> Neighborhood    Gym Cafe        Bookstore
> Downtown        15      50              0
> Midtown         30      100             10
> …
> You will need to convert the row IDs into sequential ints, which Mahout
> uses for IDs. Then read them into a sequenceFile creating a Distributed Row
> Matrix, which has Key -  Value pairs. Keys = the integer neighborhood IDs,
> the Value is a Vector (a sort of list) of column integer IDs with the
> counts.
> Then run rowsimilarity on the DRM. This is the CLI but there is also a
> Driver you can call from your code.
> There are some data prep issues you will have since larger neighborhoods
> will have higher counts. An easy thing to do would be to normalize the
> counts by something like population or physical size so you get cafes per
> resident or per sq mile or some other ratio.
> The result of the rowsimilarity job will be another DRM of key =
> neightborhood ID, values = Vector of similar neighborhoods (by integer ID)
> with a strength of similarity. Sort the vector by strength and you’ll have
> an ordered list of similar neighborhoods for each neighborhood.
> On Jun 30, 2014, at 12:48 PM, Edith Au <> wrote:
> Hi,
> I am a newbie and am looking for some guidance to implement my
> recommender.  Any help would be greatly appreciated.  I have a small
> data set of location information with the following fields:
> neighborhood, amenities, and counts.  For example:
> Downtown          Gym 15
> Downtown          Cafe 50
> …
> Midtown             Gym 30
> Midtown             Cafe 100
> Midtown             Bookstore 10
> ...
> Financial Dist
> …
> so on and so forth.  I want to recommend a neighborhood for a user to
> reside base on the amenities (and some other metrics) in his/her
> current neighborhood.    My understanding is that model-based
> recommendation would be a good fit for the job.  If I am on the right
> track,  is there a experimental/beta recommender I can try?
> If there is no such recommender yet, can I still use Mahout for my
> project?  For example, can I implement my own Similarity which only
> computes the similarity between one user's preference to a set of
> neighborhood?  If I understand Mahout correctly, User/Item Similarity
> would do N x (N-1) pair of comparisons as oppose to 1 x N comparisons.
> In my example, User/Item Similarity would compare between Downtown,
> Midtown, Fin Dist -- which would be a waste in computation resources
> since the comparisons are not needed.
> Thanks in advance for your help.
> Edith

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message