Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51F0849B9 for ; Thu, 2 Jun 2011 13:49:16 +0000 (UTC) Received: (qmail 49217 invoked by uid 500); 2 Jun 2011 13:49:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49116 invoked by uid 500); 2 Jun 2011 13:49:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49108 invoked by uid 99); 2 Jun 2011 13:49:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 13:49:13 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alexander.rosemann@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 13:49:06 +0000 Received: by wwi18 with SMTP id 18so632745wwi.5 for ; Thu, 02 Jun 2011 06:48:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=BfuCOfnqTArlTxiyKHtnX46KU1Ai3K+HvUh0wIILL9E=; b=RS+VtD8/5SXbcw0dbt1kIXSxOn9QfwiIDTAcZjGQ8r8Mwb+FZ5F37/VIDaAWIOvi8m hIyNPgFPVVCTfs9KoEWszCWNX+54oaYP3G1odaDPZ1Vpquxo1WlBWPKOnVkRuQXw2FRP ZhqMtqxKOgD+0QvkuyFcye6jSil94t7Ky6GGM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=VzMiVMFuHStfvHDpeTn3yLCl6nTExsqNNq8DaO/Zug5dQ1R47aTCm+tAHpIZqqyVHN 3vTxwnKgEm2iOhfJpB/CQVS1dCZPFD8I2revtoc0gWJD0cFQ+bZmzKlluLtUv0TIdu4/ tX5BOL6m7xrkFs79R2z5YWjFbNkYvYX7bmXl4= Received: by 10.216.132.5 with SMTP id n5mr6202305wei.7.1307022524817; Thu, 02 Jun 2011 06:48:44 -0700 (PDT) Received: from [10.64.58.100] ([90.146.67.182]) by mx.google.com with ESMTPS id g4sm337105weg.12.2011.06.02.06.48.43 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 02 Jun 2011 06:48:43 -0700 (PDT) Message-ID: <4DE794B5.2010506@gmail.com> Date: Thu, 02 Jun 2011 15:48:37 +0200 From: Alexander Rosemann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: multiple small indexes or one big index? References: <4DE6A278.4030002@gmail.com> <4DE6B10F.7090207@gmail.com> <4DE78FE3.2070106@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit No worries, I'll keep that in mind now. In addition I am going to switch to another collector as well. ATM I collect the results and then sort them using the std. Collections.sort approach... I have to look what Lucene offers and switch to something else. Thanks, Alex On 02.06.2011 15:36, Erick Erickson wrote: > Sounds good, just be sure to keep your (now single) searcher open! Also, > be sure to measure queries after a while. The first few queries will fill up > caches etc, so the time should improve after the first few. > > Best > Erick > > On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann > wrote: >> Hi Erick, caching the IndexSearchers didn't took too much effort and >> decreased searching already by 30%! >> >> I am busy changing the code to use a single index as you suggested atm. >> Still a few things left to be done but once I have it working I let you know >> how much faster it is for me. >> >> Thanks, >> Alex >> >> On 02.06.2011 13:04, Erick Erickson wrote: >>> >>> At this size, really consider going to a single index. The lack of >>> administrative headaches alone is probably well worth the effort.... >>> >>> I almost guarantee that the time you spend re-writing things to keep >>> the searchers open (and finding the bugs!) will be far more than just >>> putting all the data in a single index. >>> >>> But that might just be my preferences showing.... >>> >>> Best >>> Erick >>> >>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann >>> wrote: >>>> >>>> Many thanks for the tips, Erick! I do close each searcher after a >>>> search... >>>> I will change that first thing tmrw. and let you know how that went. >>>> Multi-threaded searching will be next and if that hasn't helped, I will >>>> switch to one big index. >>>> All indexes together are rather small, ~200MB and 50.000 documents. >>>> >>>> -Alex >>>> >>>> On 01.06.2011 23:26, Erick Erickson wrote: >>>>> >>>>> I'd start by putting them all in one index. There's no penalty >>>>> in Lucene for having empty fields in a document, unlike an >>>>> RDBMS. >>>>> >>>>> Alternately, if you're opening then closing searchers each >>>>> time, that's very expensive. Could you open the searchers >>>>> once and keep them open (all 90 of them)? That alone might >>>>> do the trick and be less of a change to your program. You >>>>> could also fire multiple threads at the searches, but check if >>>>> you're CPU bound first (if you are, multiple threads won't >>>>> help much/at all). >>>>> >>>>> You haven't said how big these indexes are nor how many >>>>> documents you're talking about here, so this advice is suspect. >>>>> >>>>> Do look at putting it all in one index though, let us know if you >>>>> have some data indicating how big stuff is/would be. >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann >>>>> wrote: >>>>>> >>>>>> Hi all, I was wondering whether you could give me some advice on how to >>>>>> improve my search performance. >>>>>> >>>>>> I have 90 lucene indexes, each having different fields (~5 per >>>>>> Document). >>>>>> When I search, I always have to go through all indexes to build my >>>>>> result >>>>>> set. Searching one index takes approx. 100ms, thus searching all >>>>>> indexes >>>>>> takes 9s in total. >>>>>> >>>>>> How can I reduce the time it needs to search? >>>>>> >>>>>> I decided to create this many indexes because putting all data in one >>>>>> index >>>>>> would mean that a document would have ~400 fields, with most of them >>>>>> left >>>>>> empty. Is that ok? Would a single index be faster compared to multiple >>>>>> small >>>>>> ones? >>>>>> >>>>>> Any pointers are much appreciated. >>>>>> >>>>>> Regards, >>>>>> Alex >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org