mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Divya" <di...@k2associates.com.sg>
Subject generate similar documents
Date Thu, 28 Oct 2010 08:11:44 GMT
Hi,

I have directory of documents from which I have generated Sequence file
using SequenceFilesFromDirectory and then converted it into vectors
SparseVectorsFromSequenceFiles

Now referring below link to  generate a list of most similar documents 

 

http://mail-archives.apache.org/mod_mbox/mahout-user/201007.mbox/%3C4C2E3EED
.6070703@googlemail.com%3E

 

How can I use RowSimilarityJob to generate list of similar documents  .

 

<ol>

 * <li>-Dmapred.input.dir=(path): Directory containing a {@link
DistributedRowMatrix} as a

 * SequenceFile<IntWritable,VectorWritable></li>

 * <li>-Dmapred.output.dir=(path): output path where the computations output
should go (a {@link DistributedRowMatrix}

 * stored as a SequenceFile<IntWritable,VectorWritable>)</li>

 * <li>--numberOfColumns: the number of columns in the input matrix</li>

 * <li>--similarityClassname (classname): an implementation of {@link
DistributedVectorSimilarity} used to compute the

 * similarity</li>

 * <li>--maxSimilaritiesPerRow (integer): cap the number of similar rows per
row to this number (100)</li>

 * </ol>

 *

 

Which argument should I pass numberOfColumns and similarityClassname ?

 

 

Regards,

Divya 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message