mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wine lover <>
Subject parameter setting for using Seqdirectory and SequenceFile
Date Mon, 27 Jun 2011 20:36:15 GMT
Hello Everyone,

When using seqdirectory to convert directory of documents to SequenceFile
format, it asks to set the parameter of chunk size:
<-chunk <MAX SIZE OF EACH CHUNK in Megabytes> 64>

In the example of, the chunk size is setup as 5. But I do
not know why? Is parameter input-dependent or system-dependent? Is there any
rule for setting this parameter?

When using seq2sparse to creat vectors from SequenceFile, I notice that the use it as follows:
$MAHOUT seq2sparse \
    -i mahout-work/reuters-out-seqdir/ \
    -o mahout-work/reuters-out-seqdir-sparse-lda \
    -wt tf -seq -nr 3 \

What does "-nr 3" stand for?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message