lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: UnscoredRangeQuery
Date Fri, 15 Apr 2005 07:06:01 GMT
On Friday 15 April 2005 06:11, Chris Lamprecht wrote:
> Hi Yonik, 
> 
> I'm interested, but I didn't see any files attached.  Not sure if it's
> at my end or the mailing list.  Is there another way I can get these
> files?  Thanks
> 
> -Chris
> 
> On 4/14/05, Yonik Seeley <yseeley@gmail.com> wrote:
> > OK, so I implemented an UnscoredRangeQuery we needed for use with
> > lucene 1.4.3.  Seems to work fine for me, so I thought I would put it
> > out here to see what you guys think... (files attached)
> > 
> > Would a cleaned up version be useful for some version of Lucene, or
> > will all the current work that Paul is doing in the queries & scorers
> > make this method obsolete?

A cleaned up version could well be useful, and would not obsolete
the new, ordered scorers.
For maximum performance, top level disjunctions are probably best done
unordered, for example like the 1.4 BooleanScorer.
From what I understood so far, this UnscoredRangeQuery has the same
properties: it is a disjunction and it also works unordered.

A disjunction that is (severely) limited by a filter or that is a subquery of
a conjunction is probably best evaluated in an ordered way, and for these
cases the DisjunctionSumScorer in the development version is good.

> > 
> > Scoring seems an order of magnitude more complex than analysis and
> > finding terms & docs.  I'd appreciate guidance or suggestions related
> > to scoring and what numbers I'm currently returning (as long as it
> > doesn't slow things down much).

Scoring is not simple, but it normally does not slow down query searching.

For disjunctions that combine scores of different subqueries, it is necessary
to somehow make sure that a document is scored for all subqueries, and
this (ordering of documents) causes some performance loss.

> > 
> > Features:
> >  - can handle any number of terms... doesn't expand to a boolean query
> >  - can be used anywhere in a normal query hierarchy (unlike RangeFilter)
> >  - can be open ended on both ends
> >  - endpoints can be independently inclusive or exclusive
> >  - produces a constant score for each hit (could be a mis-feature also...)
> > 
> > MisFeatures:
> >  - no skipTo()... it currently impersonates a BooleanQuery because of
> >   http://issues.apache.org/bugzilla/show_bug.cgi?id=34407
> >  - no per-doc scoring (a small constant is returned).  we don't have
> > any range queries where scoring makes sense, and it's faster without
> > it.

For performance on disjunctions, working unordered is good.
So I think we need a way of structuring scorers that allows for unordered
disjunctions on top level.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message