lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Span queries and complex scoring
Date Tue, 11 Sep 2007 18:05:25 GMT

In case your requirements allow this, try and use subclass of Spans
that has a score() method that returns a value that is used together
with the other span info to provide a score value to your own
SpanScorer at the top level.
This score value can summarize the influence of the individual
span scores of the subqueries.
For this you will need to change the whole span package, but
it is somewhat simpler than using a complete Scorer for each
SpanQuery in the query tree.

With a lot of nested SpanOrQueries, merging the Spans can become
a performance bottleneck. The current situation can be improved
by creating a specialized PriorityQueue for Spans, much like the
ScorerDocQueue that is used by DisjunctionSumScorer.
With this, it is possible to avoid SpanOrQuery by using term payloads
to compute the score value for the Spans of a SpanTermQuery, 
but iirc the payloads are not yet in the trunk.

Paul Elschot

On Tuesday 11 September 2007 16:17, melix wrote:
> Hi,
> I'm working on an application which requires a complex scoring (based on
> semantics analysis). The scoring must be highly configurable, and I've found
> ways to do that, but I'm facing a discrete but annoying problem. All my
> queries are, basically, complex span queries. I mean for example a
> SpanNearQuery which embeds a SpanOrQuery which itself may embed another
> SpanNearQuery etc...
> I've followed the instructions at
> about changing scoring. The problem is that a document score is highly
> dependent on *what* matched, and that the getSpans() method on spanqueries
> does not provide that kind of information.
> I created my own SpanQuery subclasses which override the createWeight method
> so that the scorer used is my own too. It basically replaces the SpanScorer,
> and should recurse the spans tree to compose a score based on the type of
> subqueries (near, and, or, not) and what matched. The problem is that the
> getspans() methods that exists in Lucene are either anonymous classes which
> I cannot browse, or that I have not access to the required information.
> Basically, in a SpanOrQuery, I am not able to find out what matched. Have
> any of you faced that kind of problem, and found out an elegant way to do it
> without having to completely rewrite each getSpans() method for all types of
> queries (this is basically what was done in a previous version of the
> application) ?
> Thanks,
> Cedric
> -- 
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message