Hi,
I am using lucene 1.3 final version.
Let us say there are 10,000 files with size of 20 MB. So, total file
system size = 10,000 * 20 MB = 200 GB. I want to index these files.
Let us say, the merge factor = 10
Min heap size required by JVM = 10 * 20 = 200 MB
>From the http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
article,


For instance, if we set mergeFactor to 10, a new segment will be created
on the disk for every 10 documents added to the index. When the 10th
segment of size 10 is added, all 10 will be merged into a single segment
of size 100. When 10 such segments of size 100 have been added, they
will be merged into a single segment containing 1000 documents, and so
on. Therefore, at any time, there will be no more than 9 segments in
each power of 10 index size.


If the lucene is indexing the 1000th document, then the current time
segment size would be 100. At that time, how many documents would the
lucene hold in memory (10 documents or 100 documents)? If the lucene
holds 100 documents, then min heap memory required will be 100 * 20 = 2
GB, which is unlikely.
Is the optimize process memory intensive? How much memory lucene would
take while doing the optimize?
Is it safe to assume that the maximum heap size required by lucene is
mergefactor * maximum_file_size?
I am planning to use the default maxMergeDocs as default, as stated in
the article.
Any help on the above questions is highly appreciated.
Thanks,
Sreedhar

To unsubscribe, email: luceneuserunsubscribe@jakarta.apache.org
For additional commands, email: luceneuserhelp@jakarta.apache.org
