Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 82816 invoked from network); 29 Apr 2007 11:53:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Apr 2007 11:53:17 -0000 Received: (qmail 13991 invoked by uid 500); 29 Apr 2007 11:53:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 13140 invoked by uid 500); 29 Apr 2007 11:53:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 13125 invoked by uid 99); 29 Apr 2007 11:53:17 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Apr 2007 04:53:17 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [202.51.64.130] (HELO idlewild.ccnep.com.np) (202.51.64.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Apr 2007 04:53:10 -0700 Received: from neplaptop ([202.51.64.158]) by idlewild.ccnep.com.np (8.13.3/8.13.3) with ESMTP id l3TCWepJ012432 for ; Sun, 29 Apr 2007 18:17:43 +0545 Message-Id: <200704291232.l3TCWepJ012432@idlewild.ccnep.com.np> From: "Chandan Tamrakar" To: Subject: batch indexing Date: Sun, 29 Apr 2007 17:37:35 +0545 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00DC_01C78A85.16161D60" X-Mailer: Microsoft Office Outlook, Build 11.0.5510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Thread-Index: AceKVOCYs2kP9D3QTISK37fPuQHT1A== X-Scanned-By: MIMEDefang 2.51 on 202.51.64.130 X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_00DC_01C78A85.16161D60 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I am trying to index a huge documents on batches . Batch size is parameterized to the application say X docs , that means it will hold X no. of Docs in the RAM before I flush to file system using IndexWriter.addIndexes(Directory[]) method My question is : Do I need to set mergefactor ? , will it hold default mergefactor docs in memory before it is written to disk as segment . (But my application will call indexwriter.addindexes function only after X no of documents are in memory) If the index sizes are big , at some point of time there might be a out of memory exceptions , ( yes I could check a memory before another ramdirectory is being created) But what would be the best solution ? Is FSDirectory is better option than Ramdirectory for huge text indexing ? I have roughly 50 GB of fulltext to index? Thks in advance. ------=_NextPart_000_00DC_01C78A85.16161D60--