mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: Load output of rowsimilarity to memory
Date Mon, 24 Feb 2014 19:41:31 GMT
The output of RowSimilarityJob can be loaded by the FileItemSimilarity.


On 02/24/2014 08:31 PM, Juan José Ramos wrote:
> Is there a way to reproduce this process:
> inside Java code and not using the command line tool? I am not interested
> in the clustering part but in 'Calculate several similar docs to each doc
> in the data'. In particular, I am interested in loading the output of the
> rowsimilarity tool into memory to be used as my custom ItemSimilarity
> implementation for an ItemBasedRecommender.
> What I exactly want is to have a matrix in memory where for every doc in my
> catalogue I have the similarity with the 100 (that is the threshold I am
> using) most similar items an undefined similarity for the rest.
> Is it possible to do with the Java API? I know it can be done calling the
> commands from inside the Java code and I guess that also using
> corresponding SparseVectorsFromSequenceFiles, DistributedRowMatrix and
> RowItemSimilarityJob. But I still see cannot see an easy way of parsing the
> output of RowItemSimilarityJob to the memory representation I intend to
> use.
> Thanks a lot.

View raw message