mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ke xie <oed...@gmail.com>
Subject Re: How about a LSH recommender ?
Date Wed, 13 Apr 2011 07:43:58 GMT
Ok, I would try to implement a none-distributed one. Actually I have a
python version now.

But I have a problem. When doing min-hash, the matrix should be either 1 or
0, and then do the hash functions. Then how about rating data? If the matrix
is filled with 1~5 numbers, should we convert them use some treshould and
convert the rating to 1 if the rating is more than the treshould?

This is the reference I read about LSH. check it out (chapter 3)
http://infolab.stanford.edu/~ullman/mmds.html

On Wed, Apr 13, 2011 at 3:25 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Sure.
>
> LSH is a fine candidate for parallelism and scaling.
>
> I would recommend starting small and testing as you go rather than leaping
> into a parallelized full-fledged implementation.  Look for other open-source
> implementaions of LSH algorithms.
>
> Be warned that the parameter selection for LSH can be pretty tricky (so I
> hear, anyway).  You should pick a reasonable and realistic test problem so
> that you can experiment with that.
>
>
> On Wed, Apr 13, 2011 at 12:19 AM, ke xie <oeddyo@gmail.com> wrote:
>
>> Can we implement one and contribute into the mahout project? Any
>> suggestions?
>>
>
>


-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message