lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Lucene trunk update question. WAS RE: search performance enhancement
Date Thu, 22 Sep 2005 18:17:29 GMT
On Thursday 22 September 2005 11:45, Peter Gelderbloem wrote:
> I noticed you posted links to bits of code below.
> How and when will these be made part of the svn trunk?
> It seems universally useful and I think should be the default behaviour.

The code has the same licence as the trunk, so you can basically
use it as you like. Off course, more improvements would be appreciated.
I don't know when/if it will become part of the trunk.

Regards,
Paul Elschot

> Peter Gelderbloem 
> -----Original Message-----
> From: Paul Elschot [mailto:paul.elschot@xs4all.nl] 
> Sent: 21 September 2005 19:16
> To: java-user@lucene.apache.org
> Subject: Re: search performance enhancement
> 
> On Wednesday 21 September 2005 03:29, John Wang wrote:
> > Hi Paul and other gurus:
> > 
> > In a related topic, seems lucene is scoring documents that would hit
> in a 
> > "prohibited" boolean clause, e.g. NOT field:value. It doesn't seem to
> make 
> > sense to score a document that is to be excluded from the result. Is
> this a 
> > difficult thing to fix?
> 
> In the svn trunk (the development version) excluded clauses of
> BooleanQuery are no not scored.
> The default scoring method is so cheap on CPU time that this is hardly
> noticeable.
> 
> > Also in Paul's ealier comment: "... unless you have large indexes,
> this will 
> > probably not make much difference", what is "large" in this case. 
> 
> A BitSet takes one bit per document in the index, and a SortedVIntList
> takes about 1 byte per document passing the filter.
> So when you need many filters, each passing less than 1/8 of the indexed
> docs, a SortedVIntList takes less memory.
>  
> > In our case, say millions of documents match some query, but after
> either a 
> > Filter is applied or after a NOT query (e.g. query with a
> NOT/prohibited 
> > clause) is applied, the resulting hit list has only 10 documents.
> Seems the 
> > millions of calls to score() is wasted, some of the score() call can
> be 
> > computational intensive.
> 
> The FilteredQuery posted here:
> http://issues.apache.org/jira/browse/LUCENE-330
> will score only the documents passing the filter, for both a BitSet and
> a SortedVIntList filter. Btw. this is the new place for the related
> filter
> implementations:
> http://issues.apache.org/jira/browse/LUCENE-328
> 
> > 
> > Am I on the right track?
> 
> That's easy: you're using Lucene.
> 
> Regards,
> Paul Elschot
> 
> 
> > Thanks
> > 
> > -John
> > 
> > On 8/19/05, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> > > 
> > > On Friday 19 August 2005 18:09, John Wang wrote:
> > > > Hi Paul:
> > > >
> > > > Thanks for the pointer.
> > > >
> > > > How would I extend from the patch you submitted to filter out
> > > > more documents not using a Filter. e.g.
> > > >
> > > > have a class to skip documents based on a docID: boolean
> > > > isValid(int docID)
> > > >
> > > > My problem is I want to discard documents at query time without
> > > > having to construct a BitSet via filter. I have my own memory
> > > > structure to help me skip documents based the query and a docid.
> > > 
> > > Basically, you need to implement the next() and skipTo(int
> targetDoc)
> > > methods from Scorer on your memory structure. They are somewhat
> > > redundant (skipTo(doc() + 1) has the same semantics as next(),
> except
> > > initially), but that is for a mix of historical and performance
> reasons.
> > > Have a look at how this is done in the posted FilteredQuery class
> > > for SortedVIntList and BitSet.
> > > With only isValid(docId) these next() and skipTo() methods would
> > > have to count over the document numbers, which is less than ideal.
> > > 
> > > When you use the posted code, iirc it is only necessary to implement
> the
> > > SkipFilter interface on your memory structure. One can use that
> interface
> > > to build/cache such memory structures using an IndexReader, and
> > > from there the DocNrSkipper interface will do the rest (of the top
> of
> > > my head).
> > > One slight problem with the current Lucene implementation is that
> > > java.lang.BitSet is not interface.
> > > 
> > > Regards,
> > > Paul Elschot.
> > > 
> > > > Thanks
> > > >
> > > > -John
> > > >
> > > > On 8/16/05, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> > > > > Hi John,
> > > > >
> > > > > On Wednesday 17 August 2005 04:46, John Wang wrote:
> > > > > > Hi:
> > > > > >
> > > > > > I posted a bug (36147) a few days ago and didn't hear
> anything, so
> > > > > > I thought I'd try my luck on this list.
> > > > > >
> > > > > > The idea is to avoid score calculations on documents to be
> filtered
> > > > > > out anyway. (e.g. via Filter object passed to the searcher
> class)
> > > > > >
> > > > > > This seems to be an easy change.
> > > > >
> > > > > Have a look here:
> > > > > http://issues.apache.org/bugzilla/show_bug.cgi?id=32965
> > > > >
> > > > > > Also it would be nice to expose a method to return a score
> given a
> > > > > > docid, e.g.
> > > > > >
> > > > > > float getScore(int docid)
> > > > > >
> > > > > > on the Scorer class.
> > > > >
> > > > > skipTo(int docid) and score() will do that.
> > > > >
> > > > > > I am gonna make the change locally and do some performance
> analysis
> > > > > > on it and will post some numbers later.
> > > > >
> > > > > The default score computations are mostly table lookups, and
> pretty 
> > > fast.
> > > > > So, unless you have large indexes, this will probably not make
> > > > > much difference, but any performance improvement is wellcome.
> > > > > In larger indexes, it helps to use skipTo() while searching.
> > > > >
> > > > > Regards,
> > > > > Paul Elschot
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > > >
> > > >
> > > 
> > > 
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > >
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message