Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 2157 invoked from network); 22 Oct 2009 12:46:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Oct 2009 12:46:17 -0000 Received: (qmail 12763 invoked by uid 500); 22 Oct 2009 12:46:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12678 invoked by uid 500); 22 Oct 2009 12:46:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12668 invoked by uid 99); 22 Oct 2009 12:46:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 12:46:14 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul_t100@fastmail.fm designates 66.111.4.26 as permitted sender) Received: from [66.111.4.26] (HELO out2.smtp.messagingengine.com) (66.111.4.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 12:46:05 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by gateway1.messagingengine.com (Postfix) with ESMTP id C38E5B4E3A for ; Thu, 22 Oct 2009 08:45:44 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Thu, 22 Oct 2009 08:45:45 -0400 X-Sasl-enc: j9O8eeF+5ovpIDNhv1yq0ATgtEefkfvD6irdlgBlWk72 1256215544 Received: from macbook.lan (unknown [217.155.98.246]) by mail.messagingengine.com (Postfix) with ESMTPA id 2E9AA6E2B5 for ; Thu, 22 Oct 2009 08:45:44 -0400 (EDT) Message-ID: <4AE053F7.60708@fastmail.fm> Date: Thu, 22 Oct 2009 13:45:43 +0100 From: Paul Taylor Reply-To: paul_t100@fastmail.fm User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Performance tips when creating a large index from database. Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I'm building a lucene index from a database, creating 1 about 1 million documents, unsuprisingly this takes quite a long time. I do this by sending a query to the db over a range of ids , (10,000) records Add these results in Lucene Then get next 10,0000 and so on. When completed indexing I then call optimize() I also set indexWriter.setMaxBufferedDocs(1000) and indexWriter.setMergeFactor(3000) but don't fully understand these values. Each document contains about 10 small fields I'm looking for some ways to improve performance. This index writing is single threaded, is there a way I can multi-thread writing to the indexing ? I only call optimize() once at the end, is the best way to do it. I'm going to run a profiler over the code, but are there any rules of thumbs on the best values to set for MaxBufferedDocs and Mergefactor() thanks Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org