lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal <>
Subject MinHash
Date Mon, 17 Oct 2011 18:13:38 GMT
There is a MinHash distribution in Mahout. I have been looking at
implementing my own and I have. The suggestion that similarity between users
can be determined by the least hash from user's click history ( and thus
implicit 0/1 preference of an article ) seems too narrow, even if we were to
use multiple hash functions and decide a probability based on number of
times the min hashes match.

Any takes on whether this approach is good for generating recommendations
and any good papers that suggest any empirical evidence for the same ?

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message