Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F2D14CE8 for ; Thu, 2 Jun 2011 07:57:26 +0000 (UTC) Received: (qmail 44699 invoked by uid 500); 2 Jun 2011 07:57:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 44490 invoked by uid 500); 2 Jun 2011 07:57:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 44482 invoked by uid 99); 2 Jun 2011 07:57:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 07:57:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of alexander.rosemann@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 07:57:10 +0000 Received: by wyb40 with SMTP id 40so731670wyb.35 for ; Thu, 02 Jun 2011 00:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=QTANx3w+tJg4e3aI/OpAnSWXygNMkXj2xY8uv9+3VCc=; b=FG2n7nh82cQ/x+DKhGjwJKjXgTx9Aj0jMCPH61oN00MHi/gDO+mGfHnVmVYEz9b3dc S/fmttuXYppy/yvkivaPRaTWdPw8CTz3SO0qCSpkpP9aOLjpJNH0fJQCv3utvYDCgc+A JHOFcCxE1yE2grO1hSxK7bdQZ85eZcE1o/E44= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=MHZesv84JSrT5rUvMctOCYLdMqrZN5lT9L+9/QZsGprNrtwTFQW9vg5Rtv2940Wh/8 7EiX39G78ETXG5l63a1eK7lmXXGxhov19ZZm3ArbPAh+Dwb3/OwA+K3y1xxfVB3ayxNC iIVN7epLNbMPsUfHIIptfJBwFLCu5VSkmgfsc= Received: by 10.227.128.138 with SMTP id k10mr393852wbs.82.1307001409403; Thu, 02 Jun 2011 00:56:49 -0700 (PDT) Received: from [10.64.58.100] ([90.146.67.182]) by mx.google.com with ESMTPS id m8sm207395wbh.62.2011.06.02.00.56.47 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 02 Jun 2011 00:56:48 -0700 (PDT) Message-ID: <4DE7423A.5010504@gmail.com> Date: Thu, 02 Jun 2011 09:56:42 +0200 From: Alexander Rosemann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: multiple small indexes or one big index? References: <4DE6A278.4030002@gmail.com> <4DE6B10F.7090207@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Many, many thanks for the input. I have applied the little change of not closing the searchers each time and search times dropped already by half! I'll try to merge all indexes into a single one next. I'll let you know how that went. On 02.06.2011 05:28, Shai Erera wrote: >> >> All indexes together are rather small, ~200MB and 50.000 documents. > > > Then I would definitely consider merging them under one index. Even if you > don't close the searcher, it will still require 90 x N ms to search them, > N=ms to search one index. > > Also, multi-threading will improve, but only up to a point - because you > cannot parallelize 90 searches (unless you have some sort of super-computer > there). > > On the other hand, if you merge them into one index then you'll be talking > about an index that's<20GB and<5M docs, which is definitely reasonable for > Lucene and performance (depends of course on the search application, but > generally) is very good. > > Starting Lucene 3.1 you can perform your searches in parallel (over one > index) using IndexSearcher, which comes in handy if your index has multiple > segments. Look at > http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.index.IndexReader, > java.util.concurrent.ExecutorService). > > Having said that, keeping the indexes separate may have advantages that your > application needs. For example, if those indexes are completely rebuilt very > frequently, then it's much better to delete and index and rebuild, then to > delete 50K docs from the merged large index. But that really depends on your > application needs. > > I'd say, if you don't see a strong case for keeping them apart, merge them > into one. Besides performance, there's also index management overhead, maybe > synchronizing commits, making sure all are closed/opened together etc., that > may just be an unnecessary overhead. > > BTW, in Lucene in Action 2nd Edition, there's an example class called > SearcherManager which manages IndexSearcher instances and ensures an > IndexSearcher instance is closed only after the last thread released it + it > can manage the reopen() logic for you as well as warming up the index. You > might want to give it a try too ! > LUCENE-2955 makes > use of it, so you can consult it for examples (it's still not committed). > > Hope this helps, > Shai > > On Thu, Jun 2, 2011 at 12:37 AM, Alexander Rosemann< > alexander.rosemann@gmail.com> wrote: > >> Many thanks for the tips, Erick! I do close each searcher after a search... >> I will change that first thing tmrw. and let you know how that went. >> Multi-threaded searching will be next and if that hasn't helped, I will >> switch to one big index. >> All indexes together are rather small, ~200MB and 50.000 documents. >> >> -Alex >> >> >> On 01.06.2011 23:26, Erick Erickson wrote: >> >>> I'd start by putting them all in one index. There's no penalty >>> in Lucene for having empty fields in a document, unlike an >>> RDBMS. >>> >>> Alternately, if you're opening then closing searchers each >>> time, that's very expensive. Could you open the searchers >>> once and keep them open (all 90 of them)? That alone might >>> do the trick and be less of a change to your program. You >>> could also fire multiple threads at the searches, but check if >>> you're CPU bound first (if you are, multiple threads won't >>> help much/at all). >>> >>> You haven't said how big these indexes are nor how many >>> documents you're talking about here, so this advice is suspect. >>> >>> Do look at putting it all in one index though, let us know if you >>> have some data indicating how big stuff is/would be. >>> >>> Best >>> Erick >>> >>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann >>> wrote: >>> >>>> Hi all, I was wondering whether you could give me some advice on how to >>>> improve my search performance. >>>> >>>> I have 90 lucene indexes, each having different fields (~5 per Document). >>>> When I search, I always have to go through all indexes to build my result >>>> set. Searching one index takes approx. 100ms, thus searching all indexes >>>> takes 9s in total. >>>> >>>> How can I reduce the time it needs to search? >>>> >>>> I decided to create this many indexes because putting all data in one >>>> index >>>> would mean that a document would have ~400 fields, with most of them left >>>> empty. Is that ok? Would a single index be faster compared to multiple >>>> small >>>> ones? >>>> >>>> Any pointers are much appreciated. >>>> >>>> Regards, >>>> Alex >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org