mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张玉东 <zhangyud...@vancl.cn>
Subject how to understand the parameter "maxSimilaritiesPerItem"
Date Thu, 08 Sep 2011 11:38:13 GMT
Hello,
In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly used in the 7th
map/reduce job “asMatrix” as

    protected void reduce(SimilarityMatrixEntryKey key,
                          Iterable<DistributedRowMatrix.MatrixEntryWritable> entries,
                          Context ctx) throws IOException, InterruptedException {
      RandomAccessSparseVector temporaryVector = new RandomAccessSparseVector(Integer.MAX_VALUE,
maxSimilaritiesPerRow);
      int similaritiesSet = 0;
      for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
        temporaryVector.setQuick(entry.getCol(), entry.getVal());
        if (++similaritiesSet == maxSimilaritiesPerRow) {
          break;
        }
      }
      SequentialAccessSparseVector vector = new SequentialAccessSparseVector(temporaryVector);
      ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
    }

I am confused that whether all the other items with similarity are written into the matrix
for each item or not, if only part of them (not more than maxSimilaritiesPerItem) are written,
then how to select them? Random?
Thanks.

yudong


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message