mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: how to understand the parameter "maxSimilaritiesPerItem"
Date Thu, 08 Sep 2011 11:40:54 GMT
This parameter denotes the maximum number of similar items to store per
single item.

On 08.09.2011 13:38, 张玉东 wrote:
> Hello,
> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly used in the
7th map/reduce job “asMatrix” as
> 
>     protected void reduce(SimilarityMatrixEntryKey key,
>                           Iterable<DistributedRowMatrix.MatrixEntryWritable> entries,
>                           Context ctx) throws IOException, InterruptedException {
>       RandomAccessSparseVector temporaryVector = new RandomAccessSparseVector(Integer.MAX_VALUE,
maxSimilaritiesPerRow);
>       int similaritiesSet = 0;
>       for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
>         temporaryVector.setQuick(entry.getCol(), entry.getVal());
>         if (++similaritiesSet == maxSimilaritiesPerRow) {
>           break;
>         }
>       }
>       SequentialAccessSparseVector vector = new SequentialAccessSparseVector(temporaryVector);
>       ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
>     }
> 
> I am confused that whether all the other items with similarity are written into the matrix
for each item or not, if only part of them (not more than maxSimilaritiesPerItem) are written,
then how to select them? Random?
> Thanks.
> 
> yudong
> 
> 


Mime
View raw message