Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (herse.apache.org: local policy)
Message-Id: <200704291232.l3TCWepJ012432@idlewild.ccnep.com.np>
From: "Chandan Tamrakar" <chandan@ccnep.com.np>
To: <java-user@lucene.apache.org>
Subject: batch indexing
Date: Sun, 29 Apr 2007 17:37:35 +0545
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_00DC_01C78A85.16161D60"
Thread-Index: AceKVOCYs2kP9D3QTISK37fPuQHT1A==

------=_NextPart_000_00DC_01C78A85.16161D60
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

I am trying to index a huge documents on batches   . Batch size is
parameterized to the application  say X docs , that means it will hold X no.
of 

Docs in the RAM before I flush to file system using
IndexWriter.addIndexes(Directory[]) method

 
My question is :

 
Do I need to set mergefactor ? , will it hold default mergefactor docs in
memory before it is written to disk as segment .

(But my application will call indexwriter.addindexes function only after X
no of documents are in memory)

 
If the index sizes are big , at some point of time there might be a out of
memory exceptions , ( yes I could check a memory before another ramdirectory
is being created) But what would be the best solution  ? Is FSDirectory is
better option than Ramdirectory for huge text indexing ? I have roughly 50
GB of fulltext to index?

 
Thks in advance.


------=_NextPart_000_00DC_01C78A85.16161D60--