mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Mahout > Minhash Clustering
Date Mon, 26 Sep 2011 00:59:00 GMT
Space: Apache Mahout (
Page: Minhash Clustering (

Edited by Lance Norskog:
Minhash clustering performs probabilistic dimension reduction of high dimensional data. The
essence of the technique is to hash each item using multiple independent hash functions such
that the probability of collision of similar items is higher. Multiple such hash tables can
then be constructed to answer near neighbor types of queries efficiently.

There is a MinHashDriver class which works in the TestMinHashClustering unit test. This is
not included in the standard driver.props class and is thus not available as a 'bin/mahout'
command-line job.

Change your notification preferences:

View raw message