lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandan Tamrakar" <chan...@ccnep.com.np>
Subject RE: batch indexing
Date Mon, 30 Apr 2007 00:59:41 GMT
Thanks Erik , so FSDirectory seems better option than RAMDirectory ? Also I
think O.S can cache files  in which case FSDirectory may not be bad , your
thoughts ?

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Sunday, April 29, 2007 7:07 PM
To: java-user@lucene.apache.org
Subject: Re: batch indexing

As I understand it, FSDirectory *is* RAMdirectory, at least until
it flushes. There have been several discussions of this,
search the mail archive for things like MergeFactor, MaxBufferedDocs
and the like. You'll find quite a bit of information about how these
parameters interact.

Particularly, see the thread titled

"MergeFactor and MaxBufferedDocs value should ...?"

I suspect, that if you're explicitly creating RAMDirectories and
merging them that your code is more complex than it needs
to be to no good purpose (of course, I've been wrong before,
once or twice at least <G>).

If you really require parallel indexing, you might try running
FSDirectory based processes on several machines, then merging
the resulting indexes as a final step. How you tell each of these
processes which documents to index I leave as an exercise for
the reader.....

Best
Erick


On 4/29/07, Chandan Tamrakar <chandan@ccnep.com.np> wrote:
>
> I am trying to index a huge documents on batches   . Batch size is
> parameterized to the application  say X docs , that means it will hold X
> no.
> of
>
> Docs in the RAM before I flush to file system using
> IndexWriter.addIndexes(Directory[]) method
>
>
>
> My question is :
>
>
>
> Do I need to set mergefactor ? , will it hold default mergefactor docs in
> memory before it is written to disk as segment .
>
> (But my application will call indexwriter.addindexes function only after X
> no of documents are in memory)
>
>
>
> If the index sizes are big , at some point of time there might be a out of
> memory exceptions , ( yes I could check a memory before another
> ramdirectory
> is being created) But what would be the best solution  ? Is FSDirectory is
> better option than Ramdirectory for huge text indexing ? I have roughly 50
> GB of fulltext to index?
>
>
>
>
>
> Thks in advance.
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message