Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 40044 invoked from network); 31 Jul 2009 12:04:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Jul 2009 12:04:33 -0000 Received: (qmail 59531 invoked by uid 500); 31 Jul 2009 12:04:32 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 59487 invoked by uid 500); 31 Jul 2009 12:04:31 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 59477 invoked by uid 99); 31 Jul 2009 12:04:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2009 12:04:31 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [81.201.117.186] (HELO exchange.btelligent.net) (81.201.117.186) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2009 12:04:21 +0000 Received: from [192.168.2.233] (212.202.134.166) by owa.btelligent.net (81.201.117.186) with Microsoft SMTP Server (TLS) id 8.1.291.1; Fri, 31 Jul 2009 14:04:00 +0200 Message-ID: <4A72DDB3.6030807@btelligent.de> Date: Fri, 31 Jul 2009 14:04:03 +0200 From: Chantal Ackermann Organization: b.telligent GmbH User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: "solr-user@lucene.apache.org" Subject: mergeFactor / indexing speed Content-Type: text/plain; charset="ISO-8859-15"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Dear all, I want to find out which settings give the best full index performance for my setup. Therefore, I have been running a small index (less than 20k documents) with a mergeFactor of 10 and 100. In both cases, indexing took about 11.5 min: mergeFactor: 10 0:11:46.792 mergeFactor: 100 /admin/cores?action=RELOAD 0:11:44.441 Tomcat restart 0:11:34.143 This is a Tomcat 5.5.20, started with a max heap size of 1GB. But it always used much less. No swapping (RedHat Linux 32bit, 3GB RAM, old ATA disk). Now, I have three questions: 1. How can I check which mergeFactor is really being used? The solrconfig.xml that is displayed in the admin application is the up-to-date view on the file system. I tested that. But it's not necessarily what the current SOLR core is using, isn't it? Is there a way to check on the actually used mergeFactor (while the index is running)? 2. I changed the mergeFactor in both available settings (default and main index) in the solrconfig.xml file of the core I am reindexing. That is the correct place? Should a change in performance be noticeable when increasing from 10 to 100? Or is the change not perceivable if the requests for data are taking far longer than all the indexing itself? 3. Do I have to increase rumBufferSizeMB if I increase mergeFactor? (Or some other setting?) (I am still trying to get profiling information on how much application time is eaten up by db connection/requests/processing. The root entity query is about (average) 20ms. The child entity query is less than 10ms. I have my custom entity processor running on the child entity that populates the map using a multi-row result set. I have also attached one regex and one script transformer.) Thank you for any tips! Chantal -- Chantal Ackermann