Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: domain of geneticflyer@googlemail.com
 designates 64.233.166.181 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=googlemail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=XOEGvRUxUZ+qcYFZJP3Q03Q6aGr0OAB7R/pks374yeDTnHdZ6w0YbsneeSwCGu6EbEf0F7c/TPjJqVeEHPPnwk8WdW8gFR6E4eSov15qFNyrqVSh/Sp75eaij1p97m6jHtdD2Tw8Xyhc2q1J7fCvm6dOGCcCBvBaXONEcc8ITzY=
Message-ID: <78bc38bc0608200648wb83f1cl611cbcc0299e1eca@mail.gmail.com>
Date: Sun, 20 Aug 2006 14:48:11 +0100
From: "M A" <geneticflyer@googlemail.com>
To: java-user@lucene.apache.org
Subject: Re: Search Performance Problem 16 sec for 250K docs
In-Reply-To: <359a92830608200616s1bb2aef8md7f406598595e9ac@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_17793_5189881.1156081691296"
References: <78bc38bc0608190655q3b777279m39dbb16469c1c17a@mail.gmail.com>
	 <359a92830608190822g68263974we13490c444bd676b@mail.gmail.com>
	 <78bc38bc0608190853g5ddc2f0ald2b6e5ec4994c7f1@mail.gmail.com>
	 <Pine.LNX.4.58.0608191549050.31867@hal.rescomp.berkeley.edu>
	 <78bc38bc0608191701w2bc033d1oea94592fee44ce14@mail.gmail.com>
	 <Pine.LNX.4.58.0608191746120.1134@hal.rescomp.berkeley.edu>
	 <78bc38bc0608200235n1d7c71fbm3a46823a80e716f6@mail.gmail.com>
	 <359a92830608200616s1bb2aef8md7f406598595e9ac@mail.gmail.com>

------=_Part_17793_5189881.1156081691296
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

The index is already built in date order i.e. the older documents appear
first in the index, what i am trying to achieve is however the latest
documents appearing first in the search results ..  without the sort .. i
think they appear by relevance .. well thats what it looked like ..

I am looking at the scoring as we speak,


On 8/20/06, Erick Erickson <erickerickson@gmail.com> wrote:
>
> About luke... I don't know about command-line interfaces, but if you copy
> your index to a different machine and use Luke there. I do this between
> Linux and Windows boxes all the time. Or, if you can mount the remote
> drive
> so you can see it, you can just use Luke to browse to it and open it up.
> You
> may have some latency though.....
>
> See below...
>
> On 8/20/06, M A <geneticflyer@googlemail.com> wrote:
> >
> > Ok I get your point, this still however means the first search on the
> new
> > searcher will take a huge amount of time .. given that this is happening
> > now
> > ..
>
>
> You can fire one or several canned queries at the searcher whenever you
> open
> a new one. That way the first time a *user* hits the box, the warm-up will
> already have happened. Note that the same searcher can be used by multiple
> threads...
>
>
> i.e. new search -> new query -> get hits ->20+ secs ..  this happens every
> 5
> > mins or so ..
> >
> > although subsequent searches may be quicker ..
> >
> > Am i to assume for a first search the amount of  time is ok -> .. seems
> > like
> > a long time to me ..?
> >
> > The other thing is the sorting is fixed .. it never changes .. it is
> > always
> > sorted by the same field ..
>
>
> Assuming that you still have performance issues, you could think about
> building your index in pre-sorted order an just avoiding the sorting all
> together. The internal Lucene document IDs are then your sort order (a
> newly
> added doc hast an ID that is always greater than any existing doc ID). I
> don't know details of your problem space, but this might be relatively
> easy.... You won't want to return things in relevance order in that case.
> In
> fact, you probably don't want relevance in place at all since your sorting
> doesn't change.... I think a ConstantScoreQuery  might work for you here.
>
> But I wouldn't go there unless you have evidence that your sort is slowing
> you down, which is easy enough to verify by just taking it out. Don't
> bother
> with any of this until you re-use your reader though....
>
> i just built the entire index and it still takes ages .,..
>
>
> The search took ages? Or building the index? If the former, then
> rebuilding
> the index is irrelevant, it's the first time you use a searcher that
> counts.
>
> On 8/20/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > >
> > >
> > > : This is because the index is updated every 5 mins or so, due to the
> > > incoming
> > > : feed of stories ..
> > > :
> > > : When you say iteration, i take it you mean, search request, well for
> > > each
> > > : search that is conducted I create a new one .. search reader that is
> > ..
> > >
> > > yeah ... i ment iteration of your test.  don't do that.
> > >
> > > if the index is updated every 5 minutes, then open a new searcher
> every
> > 5
> > > minutes -- and reuse it for theentire 5 minutes.  if it's updated
> > > "sparadically throughout the day" then open a search, and keep using
> it
> > > untill the index is udated, then open a new one.
> > >
> > > reusing an indexsearcher as long as possible is one of biggest factors
> > of
> > > Lucene applications.
> > >
> > > :
> > > :
> > > :
> > > : On 8/19/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > > : >
> > > : >
> > > : > :     hits = searcher.search(query, new Sort("sid", true));
> > > : >
> > > : > you don't show where searcher is initialized, and you don't
> clarify
> > > how
> > > : > you are timing your multiple iterations -- i'm going to guess that
> > you
> > > are
> > > : > opening a new searcher every iteration right?
> > > : >
> > > : > sorting on a field requires pre-computing an array of information
> > for
> > > that
> > > : > field -- this is both time and space expensive, and is cached per
> > > : > IndexReader/IndexSearcher -- so if you reuse the same searcher and
> > > time
> > > : > multiple iterations you'll find that hte first iteration might be
> > > somewhat
> > > : > slow, but the rest should be very fast.
> > > : >
> > > : >
> > > : >
> > > : > -Hoss
> > > : >
> > > : >
> > > : >
> > ---------------------------------------------------------------------
> > > : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > : > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > : >
> > > : >
> > > :
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
>
>

------=_Part_17793_5189881.1156081691296--