mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Cosine distances to Random Vector basis
Date Mon, 25 Apr 2011 04:29:12 GMT
This is the starting point of the way I've always seen people do
Locality Sensitive Hashing with floating point vectors.  Once you
have these bit vectors, you can do minhash stuff on them to
complete LSH.

On Sun, Apr 24, 2011 at 8:56 PM, Lance Norskog <> wrote:

> I just found this vector distance idea in a technical paper:
> Create a space defined by  X random vectors. For you data vectors,
> take the cosine distance to each random vector and save the sign of
> the value as a bit. This gives a bit set of X bits.
> There could be another distance and algorithm for picking the bit value.
> The effect is to cease using numerical vectors as a "carrier signal"
> for the concept of "positions and distances". This is a different,
> more focused representation. And, Hamming distance is somewhat faster
> than Euclidean :) Of course, picking enough bits is a problem.
> However, I lost the paper. Does this ring a bell with anyone?
> --
> Lance Norskog

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message