lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Gelderbloem" <Peter.Gelderbl...@mediasurface.com>
Subject Lucene trunk update question. WAS RE: search performance enhancement
Date Thu, 22 Sep 2005 09:45:58 GMT
I noticed you posted links to bits of code below.
How and when will these be made part of the svn trunk?
It seems universally useful and I think should be the default behaviour.

Peter Gelderbloem 
-----Original Message-----
From: Paul Elschot [mailto:paul.elschot@xs4all.nl] 
Sent: 21 September 2005 19:16
To: java-user@lucene.apache.org
Subject: Re: search performance enhancement

On Wednesday 21 September 2005 03:29, John Wang wrote:
> Hi Paul and other gurus:
> 
> In a related topic, seems lucene is scoring documents that would hit
in a 
> "prohibited" boolean clause, e.g. NOT field:value. It doesn't seem to
make 
> sense to score a document that is to be excluded from the result. Is
this a 
> difficult thing to fix?

In the svn trunk (the development version) excluded clauses of
BooleanQuery are no not scored.
The default scoring method is so cheap on CPU time that this is hardly
noticeable.

> Also in Paul's ealier comment: "... unless you have large indexes,
this will 
> probably not make much difference", what is "large" in this case. 

A BitSet takes one bit per document in the index, and a SortedVIntList
takes about 1 byte per document passing the filter.
So when you need many filters, each passing less than 1/8 of the indexed
docs, a SortedVIntList takes less memory.
 
> In our case, say millions of documents match some query, but after
either a 
> Filter is applied or after a NOT query (e.g. query with a
NOT/prohibited 
> clause) is applied, the resulting hit list has only 10 documents.
Seems the 
> millions of calls to score() is wasted, some of the score() call can
be 
> computational intensive.

The FilteredQuery posted here:
http://issues.apache.org/jira/browse/LUCENE-330
will score only the documents passing the filter, for both a BitSet and
a SortedVIntList filter. Btw. this is the new place for the related
filter
implementations:
http://issues.apache.org/jira/browse/LUCENE-328

> 
> Am I on the right track?

That's easy: you're using Lucene.

Regards,
Paul Elschot


> Thanks
> 
> -John
> 
> On 8/19/05, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> > 
> > On Friday 19 August 2005 18:09, John Wang wrote:
> > > Hi Paul:
> > >
> > > Thanks for the pointer.
> > >
> > > How would I extend from the patch you submitted to filter out
> > > more documents not using a Filter. e.g.
> > >
> > > have a class to skip documents based on a docID: boolean
> > > isValid(int docID)
> > >
> > > My problem is I want to discard documents at query time without
> > > having to construct a BitSet via filter. I have my own memory
> > > structure to help me skip documents based the query and a docid.
> > 
> > Basically, you need to implement the next() and skipTo(int
targetDoc)
> > methods from Scorer on your memory structure. They are somewhat
> > redundant (skipTo(doc() + 1) has the same semantics as next(),
except
> > initially), but that is for a mix of historical and performance
reasons.
> > Have a look at how this is done in the posted FilteredQuery class
> > for SortedVIntList and BitSet.
> > With only isValid(docId) these next() and skipTo() methods would
> > have to count over the document numbers, which is less than ideal.
> > 
> > When you use the posted code, iirc it is only necessary to implement
the
> > SkipFilter interface on your memory structure. One can use that
interface
> > to build/cache such memory structures using an IndexReader, and
> > from there the DocNrSkipper interface will do the rest (of the top
of
> > my head).
> > One slight problem with the current Lucene implementation is that
> > java.lang.BitSet is not interface.
> > 
> > Regards,
> > Paul Elschot.
> > 
> > > Thanks
> > >
> > > -John
> > >
> > > On 8/16/05, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> > > > Hi John,
> > > >
> > > > On Wednesday 17 August 2005 04:46, John Wang wrote:
> > > > > Hi:
> > > > >
> > > > > I posted a bug (36147) a few days ago and didn't hear
anything, so
> > > > > I thought I'd try my luck on this list.
> > > > >
> > > > > The idea is to avoid score calculations on documents to be
filtered
> > > > > out anyway. (e.g. via Filter object passed to the searcher
class)
> > > > >
> > > > > This seems to be an easy change.
> > > >
> > > > Have a look here:
> > > > http://issues.apache.org/bugzilla/show_bug.cgi?id=32965
> > > >
> > > > > Also it would be nice to expose a method to return a score
given a
> > > > > docid, e.g.
> > > > >
> > > > > float getScore(int docid)
> > > > >
> > > > > on the Scorer class.
> > > >
> > > > skipTo(int docid) and score() will do that.
> > > >
> > > > > I am gonna make the change locally and do some performance
analysis
> > > > > on it and will post some numbers later.
> > > >
> > > > The default score computations are mostly table lookups, and
pretty 
> > fast.
> > > > So, unless you have large indexes, this will probably not make
> > > > much difference, but any performance improvement is wellcome.
> > > > In larger indexes, it helps to use skipTo() while searching.
> > > >
> > > > Regards,
> > > > Paul Elschot
> > > >
> > > >
> > > >
---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> > 
> > 
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> >
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message