lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 07:23:19 GMT
A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow
to a pre-defined maximum size, then merging it to a new temporary file-based index to
flush it. Repeat this, creating new directories for all the file based indexes then perform

a merge into one index once all docs are indexed.

I haven't managed to test this for myself but my colleague  says he noticed a 
considerable speed up by merging once at the end with this approach so you may want
to give it a try. (This was with Lucene 1.3)

I know Lucene uses RAMDirectory internally but the mergeFactor that controls the size of the
cache
is also used to determine the number of files used when flushed (which can get out of control).


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message