Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 27533 invoked from network); 20 Aug 2006 13:49:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Aug 2006 13:49:10 -0000 Received: (qmail 12450 invoked by uid 500); 20 Aug 2006 13:48:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12407 invoked by uid 500); 20 Aug 2006 13:48:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12396 invoked by uid 99); 20 Aug 2006 13:48:59 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Aug 2006 06:48:59 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of geneticflyer@googlemail.com designates 64.233.166.181 as permitted sender) Received: from [64.233.166.181] (HELO py-out-1112.google.com) (64.233.166.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Aug 2006 06:48:57 -0700 Received: by py-out-1112.google.com with SMTP id w49so1965098pyg for ; Sun, 20 Aug 2006 06:48:11 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=XOEGvRUxUZ+qcYFZJP3Q03Q6aGr0OAB7R/pks374yeDTnHdZ6w0YbsneeSwCGu6EbEf0F7c/TPjJqVeEHPPnwk8WdW8gFR6E4eSov15qFNyrqVSh/Sp75eaij1p97m6jHtdD2Tw8Xyhc2q1J7fCvm6dOGCcCBvBaXONEcc8ITzY= Received: by 10.65.154.10 with SMTP id g10mr5606770qbo; Sun, 20 Aug 2006 06:48:11 -0700 (PDT) Received: by 10.65.123.3 with HTTP; Sun, 20 Aug 2006 06:48:11 -0700 (PDT) Message-ID: <78bc38bc0608200648wb83f1cl611cbcc0299e1eca@mail.gmail.com> Date: Sun, 20 Aug 2006 14:48:11 +0100 From: "M A" To: java-user@lucene.apache.org Subject: Re: Search Performance Problem 16 sec for 250K docs In-Reply-To: <359a92830608200616s1bb2aef8md7f406598595e9ac@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_17793_5189881.1156081691296" References: <78bc38bc0608190655q3b777279m39dbb16469c1c17a@mail.gmail.com> <359a92830608190822g68263974we13490c444bd676b@mail.gmail.com> <78bc38bc0608190853g5ddc2f0ald2b6e5ec4994c7f1@mail.gmail.com> <78bc38bc0608191701w2bc033d1oea94592fee44ce14@mail.gmail.com> <78bc38bc0608200235n1d7c71fbm3a46823a80e716f6@mail.gmail.com> <359a92830608200616s1bb2aef8md7f406598595e9ac@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_17793_5189881.1156081691296 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline The index is already built in date order i.e. the older documents appear first in the index, what i am trying to achieve is however the latest documents appearing first in the search results .. without the sort .. i think they appear by relevance .. well thats what it looked like .. I am looking at the scoring as we speak, On 8/20/06, Erick Erickson wrote: > > About luke... I don't know about command-line interfaces, but if you copy > your index to a different machine and use Luke there. I do this between > Linux and Windows boxes all the time. Or, if you can mount the remote > drive > so you can see it, you can just use Luke to browse to it and open it up. > You > may have some latency though..... > > See below... > > On 8/20/06, M A wrote: > > > > Ok I get your point, this still however means the first search on the > new > > searcher will take a huge amount of time .. given that this is happening > > now > > .. > > > You can fire one or several canned queries at the searcher whenever you > open > a new one. That way the first time a *user* hits the box, the warm-up will > already have happened. Note that the same searcher can be used by multiple > threads... > > > i.e. new search -> new query -> get hits ->20+ secs .. this happens every > 5 > > mins or so .. > > > > although subsequent searches may be quicker .. > > > > Am i to assume for a first search the amount of time is ok -> .. seems > > like > > a long time to me ..? > > > > The other thing is the sorting is fixed .. it never changes .. it is > > always > > sorted by the same field .. > > > Assuming that you still have performance issues, you could think about > building your index in pre-sorted order an just avoiding the sorting all > together. The internal Lucene document IDs are then your sort order (a > newly > added doc hast an ID that is always greater than any existing doc ID). I > don't know details of your problem space, but this might be relatively > easy.... You won't want to return things in relevance order in that case. > In > fact, you probably don't want relevance in place at all since your sorting > doesn't change.... I think a ConstantScoreQuery might work for you here. > > But I wouldn't go there unless you have evidence that your sort is slowing > you down, which is easy enough to verify by just taking it out. Don't > bother > with any of this until you re-use your reader though.... > > i just built the entire index and it still takes ages .,.. > > > The search took ages? Or building the index? If the former, then > rebuilding > the index is irrelevant, it's the first time you use a searcher that > counts. > > On 8/20/06, Chris Hostetter wrote: > > > > > > > > > : This is because the index is updated every 5 mins or so, due to the > > > incoming > > > : feed of stories .. > > > : > > > : When you say iteration, i take it you mean, search request, well for > > > each > > > : search that is conducted I create a new one .. search reader that is > > .. > > > > > > yeah ... i ment iteration of your test. don't do that. > > > > > > if the index is updated every 5 minutes, then open a new searcher > every > > 5 > > > minutes -- and reuse it for theentire 5 minutes. if it's updated > > > "sparadically throughout the day" then open a search, and keep using > it > > > untill the index is udated, then open a new one. > > > > > > reusing an indexsearcher as long as possible is one of biggest factors > > of > > > Lucene applications. > > > > > > : > > > : > > > : > > > : On 8/19/06, Chris Hostetter wrote: > > > : > > > > : > > > > : > : hits = searcher.search(query, new Sort("sid", true)); > > > : > > > > : > you don't show where searcher is initialized, and you don't > clarify > > > how > > > : > you are timing your multiple iterations -- i'm going to guess that > > you > > > are > > > : > opening a new searcher every iteration right? > > > : > > > > : > sorting on a field requires pre-computing an array of information > > for > > > that > > > : > field -- this is both time and space expensive, and is cached per > > > : > IndexReader/IndexSearcher -- so if you reuse the same searcher and > > > time > > > : > multiple iterations you'll find that hte first iteration might be > > > somewhat > > > : > slow, but the rest should be very fast. > > > : > > > > : > > > > : > > > > : > -Hoss > > > : > > > > : > > > > : > > > --------------------------------------------------------------------- > > > : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > : > For additional commands, e-mail: java-user-help@lucene.apache.org > > > : > > > > : > > > > : > > > > > > > > > > > > -Hoss > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > ------=_Part_17793_5189881.1156081691296--