Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 22F1A6DF7 for ; Fri, 3 Jun 2011 21:00:14 +0000 (UTC) Received: (qmail 33969 invoked by uid 500); 3 Jun 2011 21:00:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 33786 invoked by uid 500); 3 Jun 2011 21:00:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 33777 invoked by uid 99); 3 Jun 2011 21:00:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 21:00:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 21:00:05 +0000 Received: by qyk30 with SMTP id 30so1445508qyk.14 for ; Fri, 03 Jun 2011 13:59:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=d4s2IiprjslfRN6jsoT5yB0x5Q7hR7PCQq58UlGPfnw=; b=f3l2nD1vaDZn5+UQNAfxF45BUT/Uusk0Ew7IwWWymn8s+EDNtVxlDOkrTQeQSSn+Db xR1xBfhTf1i9PA2WFTaH+XkQy7polWKyOfDUZhrJCY2wHq4hir0LMwJk7ZAO6siJlklo OEjADk/hzrbopoPGhy4tM/xvHrZ8iPzECSuBc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=E6Iq2lZfJCEt9897PAGqfmMfUs4rkPCs4/ffPIPI7TqOKKzuhOp3ws2K51i9l+GGQq 7XbwqoohfK96OXQKcVGJwiHrtmxruEL3Q8UQ32lV56imC2h3sXBksq8LCARJiwPTGIx/ YmIqJ1cHcS0Xb+L9Bw9l47jURigltm4iEMveQ= MIME-Version: 1.0 Received: by 10.229.101.36 with SMTP id a36mr1837089qco.74.1307134784390; Fri, 03 Jun 2011 13:59:44 -0700 (PDT) Received: by 10.229.95.202 with HTTP; Fri, 3 Jun 2011 13:59:44 -0700 (PDT) In-Reply-To: <4DE88196.501@gmail.com> References: <4DE6A278.4030002@gmail.com> <4DE6B10F.7090207@gmail.com> <4DE78FE3.2070106@gmail.com> <4DE794B5.2010506@gmail.com> <4DE88196.501@gmail.com> Date: Fri, 3 Jun 2011 16:59:44 -0400 Message-ID: Subject: Re: multiple small indexes or one big index? From: Erick Erickson To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 OK, if they're all in a single index, you might also try using Lucene sorting. Be aware that the first sort on a field takes extra time to warm the caches... But note that sorting is for single-valued, un-tokenized fields.. Best Erick On Fri, Jun 3, 2011 at 2:39 AM, Alexander Rosemann wrote: > Alright. With all the changes you suggested I am down from 9s to <1s. Again, > many thanks to both of you Erick and Shai! > > Regards, > Alex > > On 02.06.2011 15:48, Alexander Rosemann wrote: >> >> No worries, I'll keep that in mind now. >> In addition I am going to switch to another collector as well. ATM I >> collect the results and then sort them using the std. Collections.sort >> approach... I have to look what Lucene offers and switch to something >> else. >> >> Thanks, >> Alex >> >> On 02.06.2011 15:36, Erick Erickson wrote: >>> >>> Sounds good, just be sure to keep your (now single) searcher open! Also, >>> be sure to measure queries after a while. The first few queries will >>> fill up >>> caches etc, so the time should improve after the first few. >>> >>> Best >>> Erick >>> >>> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann >>> wrote: >>>> >>>> Hi Erick, caching the IndexSearchers didn't took too much effort and >>>> decreased searching already by 30%! >>>> >>>> I am busy changing the code to use a single index as you suggested atm. >>>> Still a few things left to be done but once I have it working I let >>>> you know >>>> how much faster it is for me. >>>> >>>> Thanks, >>>> Alex >>>> >>>> On 02.06.2011 13:04, Erick Erickson wrote: >>>>> >>>>> At this size, really consider going to a single index. The lack of >>>>> administrative headaches alone is probably well worth the effort.... >>>>> >>>>> I almost guarantee that the time you spend re-writing things to keep >>>>> the searchers open (and finding the bugs!) will be far more than just >>>>> putting all the data in a single index. >>>>> >>>>> But that might just be my preferences showing.... >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann >>>>> wrote: >>>>>> >>>>>> Many thanks for the tips, Erick! I do close each searcher after a >>>>>> search... >>>>>> I will change that first thing tmrw. and let you know how that went. >>>>>> Multi-threaded searching will be next and if that hasn't helped, I >>>>>> will >>>>>> switch to one big index. >>>>>> All indexes together are rather small, ~200MB and 50.000 documents. >>>>>> >>>>>> -Alex >>>>>> >>>>>> On 01.06.2011 23:26, Erick Erickson wrote: >>>>>>> >>>>>>> I'd start by putting them all in one index. There's no penalty >>>>>>> in Lucene for having empty fields in a document, unlike an >>>>>>> RDBMS. >>>>>>> >>>>>>> Alternately, if you're opening then closing searchers each >>>>>>> time, that's very expensive. Could you open the searchers >>>>>>> once and keep them open (all 90 of them)? That alone might >>>>>>> do the trick and be less of a change to your program. You >>>>>>> could also fire multiple threads at the searches, but check if >>>>>>> you're CPU bound first (if you are, multiple threads won't >>>>>>> help much/at all). >>>>>>> >>>>>>> You haven't said how big these indexes are nor how many >>>>>>> documents you're talking about here, so this advice is suspect. >>>>>>> >>>>>>> Do look at putting it all in one index though, let us know if you >>>>>>> have some data indicating how big stuff is/would be. >>>>>>> >>>>>>> Best >>>>>>> Erick >>>>>>> >>>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi all, I was wondering whether you could give me some advice on >>>>>>>> how to >>>>>>>> improve my search performance. >>>>>>>> >>>>>>>> I have 90 lucene indexes, each having different fields (~5 per >>>>>>>> Document). >>>>>>>> When I search, I always have to go through all indexes to build my >>>>>>>> result >>>>>>>> set. Searching one index takes approx. 100ms, thus searching all >>>>>>>> indexes >>>>>>>> takes 9s in total. >>>>>>>> >>>>>>>> How can I reduce the time it needs to search? >>>>>>>> >>>>>>>> I decided to create this many indexes because putting all data in >>>>>>>> one >>>>>>>> index >>>>>>>> would mean that a document would have ~400 fields, with most of them >>>>>>>> left >>>>>>>> empty. Is that ok? Would a single index be faster compared to >>>>>>>> multiple >>>>>>>> small >>>>>>>> ones? >>>>>>>> >>>>>>>> Any pointers are much appreciated. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Alex >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org