lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Different scoring mechanism
Date Sat, 10 Jun 2006 14:23:55 GMT
On 6/10/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>  Other than working on required clauses in a BooleanQuery first, and skipping if there
are no matching Docs for them, there are no other query optimization strategies/tricks, are
there?


I think that's pretty much it, depending on what you consider an
optimization/trick.
Efficient seeking to a specific term, efficient skipping to a
particular document number for that term, keeping track of the term
with the lowest docid with heaps or priority queues, and keeping track
of the highest scoring docs with a priority queue.

Higher level optimizations that do query transformations are left as
an exercise to the application :-)

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


> Otis
>
> ----- Original Message ----
> From: Chris Hostetter <hossman_lucene@fucit.org>
> To: java-user@lucene.apache.org
> Sent: Friday, June 9, 2006 3:08:35 PM
> Subject: RE: Different scoring mechanism
>
>
>
> : For example: a query containing two terms: "fast", "car", having
> : document frequencies 300.000 and 20.000 in the index respectively. In a
> : worst case scenario this would require 320.000 document scores to be
> : calculated. I am not really sure how lucene optimizes its search, but I
> : guess it does that by first processing the documents having the highest
> : term frequencies (and thus highest combined score) with these query
> : terms, and pruning the search if the n hits have been found and it's
> : certain that no document can be found which will give a higher score.
>
> Nope.  Lucene scores all "matching" documents in the index in increasing
> order of docId -- it can optimize the process using "skipTo" in Scorers
> when it knows that it's not possible for for a document to "match" the
> overall query, so it "skips ahead" to the first doc that can match.
>
> ie: if you have a boolean query like "+title:cat +title:dog body:snake" it
> knows that unless something matches title:cat and title:dog then there is
> not point in checking wether it matches body:snake -- let alone scoring
> hte doc at all.  so BooleanScorer uses skipTo on the individual Scorers
> for title:cat and title:dog to keep skipping ahead untill it finds a doc
> matching both, then it checks if it matches body:snake, and if it does
> *then* it scores things.
>
> : If I would change the next function in my own scorer to process all
> : document ids, I am afraid I will wreck Lucene's optimization method (as
> : I am then not serving the documents in descending term frequency order).
>
> it would certianly eliminate lucenes ability to skip ahead (allthough
> not in the way you imagined) ... but based on the way you've described how
> you want scoring to work, it has to score every doc no matter what --
> you've said that even if it doesn't contain the term at all it may get a
> score value which needs to be factored in to the overall score.
>
>
>
> -Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message