mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: Question on RowSimilarityJob
Date Fri, 20 Jan 2012 17:58:33 GMT

'maxSimilaritiesPerRow' denotes the maximum number of similar rows
(documents in your use case) to keep per document.
'excludeSelfSimilarity' means that rows (documents) should not be
compared to themselves.

Sry for the lack of documentation, RowSimilarityJob was originally only
an internal job for the recommendation code. I'll try to add something
on the wiki in the next days.


On 20.01.2012 17:38, Suneel Marthi wrote:
> I am working on determining document similarity of a corpus I am working with using RowSimilarity.
> Questions:-
> a) What do the parameters - 'maxSimilaritiesPerRow' and 'excludeSelfSimilarity' mean?
> b) Are there any docs available on RowSimilarityJob available, this is the best I could
find on Sebastian's blog - .
> c) Also do we have any docs on RowIdJob ?
> Thanks and Regards,
> Suneel

View raw message