lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From melix <>
Subject Re: Span queries and complex scoring
Date Mon, 17 Sep 2007 10:51:36 GMT

Thanks Paul. I'm doing something very similar, but I'd like to notice that it
is very hard to "extend" Lucene without breaking compatibility. I mean I
have written classes that expand SpanQueries, but, for example, and to my
mind not understandable, classes like "BooleanWeight" are package protected,
which prevents from instanciating them. This often leads to code copies
which is not recommanded for maintainability...

I've faced the very same problem with NearSpansOrdered and so on. I think
unless there is a very good reason for it, classes should be made public,
this would at least make the "delegate" design pattern available.



Paul Elschot wrote:
> Cedric,
> In case your requirements allow this, try and use subclass of Spans
> that has a score() method that returns a value that is used together
> with the other span info to provide a score value to your own
> SpanScorer at the top level.
> This score value can summarize the influence of the individual
> span scores of the subqueries.
> For this you will need to change the whole span package, but
> it is somewhat simpler than using a complete Scorer for each
> SpanQuery in the query tree.
> With a lot of nested SpanOrQueries, merging the Spans can become
> a performance bottleneck. The current situation can be improved
> by creating a specialized PriorityQueue for Spans, much like the
> ScorerDocQueue that is used by DisjunctionSumScorer.
> With this, it is possible to avoid SpanOrQuery by using term payloads
> to compute the score value for the Spans of a SpanTermQuery, 
> but iirc the payloads are not yet in the trunk.
> Regards,
> Paul Elschot
> On Tuesday 11 September 2007 16:17, melix wrote:
>> Hi,
>> I'm working on an application which requires a complex scoring (based on
>> semantics analysis). The scoring must be highly configurable, and I've
>> found
>> ways to do that, but I'm facing a discrete but annoying problem. All my
>> queries are, basically, complex span queries. I mean for example a
>> SpanNearQuery which embeds a SpanOrQuery which itself may embed another
>> SpanNearQuery etc...
>> I've followed the instructions at
>> about changing scoring. The problem is that a document score is highly
>> dependent on *what* matched, and that the getSpans() method on
>> spanqueries
>> does not provide that kind of information.
>> I created my own SpanQuery subclasses which override the createWeight
>> method
>> so that the scorer used is my own too. It basically replaces the
>> SpanScorer,
>> and should recurse the spans tree to compose a score based on the type of
>> subqueries (near, and, or, not) and what matched. The problem is that the
>> getspans() methods that exists in Lucene are either anonymous classes
>> which
>> I cannot browse, or that I have not access to the required information.
>> Basically, in a SpanOrQuery, I am not able to find out what matched. Have
>> any of you faced that kind of problem, and found out an elegant way to do
>> it
>> without having to completely rewrite each getSpans() method for all types
>> of
>> queries (this is basically what was done in a previous version of the
>> application) ?
>> Thanks,
>> Cedric
>> -- 
>> View this message in context: 
>> Sent from the Lucene - Java Users mailing list archive at
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message