Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4369B92F8 for ; Sat, 26 May 2012 14:33:34 +0000 (UTC) Received: (qmail 2564 invoked by uid 500); 26 May 2012 14:33:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 2523 invoked by uid 500); 26 May 2012 14:33:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 2510 invoked by uid 99); 26 May 2012 14:33:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 14:33:31 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of teddyyyy123@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 14:33:26 +0000 Received: by obbef5 with SMTP id ef5so4035615obb.35 for ; Sat, 26 May 2012 07:33:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=zfLHD+XS36rZJ55pV1avyMFz+qr9eoaBoynCxxvy4/M=; b=DDHq5hxbyFu9WZrIw3y0/J+Hgp8kskrapPf7ouyc7tPuFI+NfTvWvBFM/S1PZ3CZ03 a71/fYuLvVQ2Utc3QscYrEM0KBLqoOLCPWfbl1sNbKWA7pDt+Jn6jQb4QWnTSqEOp1GX VEbIt9F9vq0XrLvo+Q1oOaSc1HYKergACl6av9ERkNGY1N8pWOpHpjs3wgy8FFNOj9+M Q/lyNGHEQk0U8EVSRnB1i/PwawvXN+1bEc2AbKhV34ltTPIGzj7juiFq8oKHNvo82KCL djKn4HGwBr1kLVRAs3Pv0CkTox3n0nJ9soqyx2XgrgwaUEm1XUi1KL8ggDrVvaARKoBa y3xg== Received: by 10.60.4.165 with SMTP id l5mr2511423oel.41.1338042786009; Sat, 26 May 2012 07:33:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.53.195 with HTTP; Sat, 26 May 2012 07:32:45 -0700 (PDT) In-Reply-To: References: From: Yang Date: Sat, 26 May 2012 07:32:45 -0700 Message-ID: Subject: Re: lucene (search) performance tuning To: java-user@lucene.apache.org, simon.willnauer@gmail.com Content-Type: multipart/alternative; boundary=e89a8ff1c2e8f451d904c0f15ca2 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c2e8f451d904c0f15ca2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I'm using disjunction (OR) query. unfortunately all of the clauses are optional On Sat, May 26, 2012 at 4:38 AM, Simon Willnauer < simon.willnauer@googlemail.com> wrote: > On Sat, May 26, 2012 at 2:59 AM, Yang wrote: > > I tested with more threads / processes. indeed this is completely > > cpu-bound, since running 1 thread gives the same latency as 4 threads (= my > > box has 4 cores) > > > > > > given this, is there any way to simplify the scoring computation (i'm > only > > using lucene as a first level "rough" search, so the search quality is > not > > a huge issue here) , so that, for example, fewer fields are evaluated o= r > a > > simpler scoring function is used? > > are you using disjunction or conjunction queries? Can you make some > parts of the query mandatory? > > simon > > > > thanks > > Yang > > > > On Fri, May 25, 2012 at 5:47 PM, Yang wrote: > > > >> thanks a lot guys > >> > >> > >> On Tue, May 22, 2012 at 1:34 AM, Ian Lea wrote: > >> > >>> Lots of good tips in > >>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from > >>> the FAQ. > >>> > >>> > >>> -- > >>> Ian. > >>> > >>> > >>> On Tue, May 22, 2012 at 2:08 AM, Li Li wrote: > >>> > something wrong when writing in my android client. > >>> > if RAMDirectory do not help, i think the bottleneck is cpu. you may > try > >>> to > >>> > tune jvm but i do not expect much improvement. > >>> > the best one is splitting your index into 2 or more smaller ones. > >>> > you can then use solr s distributed searching. > >>> > if the cpu is not fully used, yuo can do this in one physical machi= ne > >>> > > >>> > =E5=9C=A8 2012-5-22 =E4=B8=8A=E5=8D=888:50=EF=BC=8C"Li Li" =E5=86=99=E9=81=93=EF=BC=9A > >>> >> > >>> >> > >>> >> =E5=9C=A8 2012-5-22 =E5=87=8C=E6=99=A84:59=EF=BC=8C"Yang" =E5=86=99=E9=81=93=EF=BC=9A > >>> >> > >>> >> > > >>> >> > I'm trying to make my search faster. right now a query like > >>> >> > > >>> >> > name:Joe Moe Pizza address:77 main street city:San Francisco > >>> >> >is this a conjunction query or a disjunction query=EF=BC=9F > >>> >> > >>> >> > in a index with 20mil such short business descriptions (total si= ze > >>> > about 3GB) takes about 100--200ms. > >>> >> >20m is not a small size, how many results for a query in average= =EF=BC=9F > >>> >> > >>> >> > I profiled the query, most time is spent in TermScorer.score(), > as is > >>> > shown by the attached yourkit screenshot. > >>> >> >that=EF=BC=87s true, for a query, matching and scoring is very ti= me > consuming > >>> > and cpu intensive. another one is io for reading postings. > >>> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > I tried loading the index onto tmpfs (in-memory block device), a= nd > >>> also > >>> > tried RAMDirectory, neither helps much. > >>> >> >if that is true. it seems that io is not the > >>> >> > I am reading > >>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf > >>> >> > it mentions > >>> >> > Size > >>> >> > =E2=80=93 Stopword removal > >>> >> > =E2=80=93 Stemming > >>> >> > =E2=80=A2 Lucene has a number of stemmers available > >>> >> > =E2=80=A2 Light versus Aggressive > >>> >> > =E2=80=A2 May prevent fine-grained matches in some cases > >>> >> > =E2=80=93 Not a linear factor (usually) due to index compression > >>> >> > > >>> >> > so for "stopword removal", I'm already using the standard > analyzer, > >>> so > >>> > stop word removal is already included, right? > >>> >> > > >>> >> > also generally any other tricks to try for reducing the search > >>> latency? > >>> >> > > >>> >> > Thanks! > >>> >> > Yang > >>> >> > > >>> >> > > >>> >> > > --------------------------------------------------------------------- > >>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >>> >> > For additional commands, e-mail: java-user-help@lucene.apache.or= g > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >>> For additional commands, e-mail: java-user-help@lucene.apache.org > >>> > >>> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --e89a8ff1c2e8f451d904c0f15ca2--