lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 31785] - DisjunctionScorer
Date Wed, 19 Jan 2005 21:34:22 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31785>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31785





------- Additional Comments From paul.elschot@xs4all.nl  2005-01-19 22:34 -------
(In reply to comment #10) 
> Hi Paul, 
>  
> I finally found time to look into your code in detail and I think 
> it's really excellent work. Before committing it, I have a few questions. 
>  
> *) In your source files you have included a copyright statement referring 
> to yourself. Of course you include the Apache License. However, I haven't 
seen 
> other source files in Lucene with similar copyright statements. I don't know 
the 
> legal consequences of that. Maybe someone else on the list knows more. The 
> simplest solution would be to substitute "Copyright 2004 Paul Elschot" with 
> "Copyright 2004 The Apache Software Foundation". Would you agree? 
 
The intention is to allow the Apache Software Foundation to take over the 
copyright in case they want to.  
As I understand the Apache Licence, taking over the copyright is 
allowed by the licence. So I used my own copyright, and it could be changed 
when taken over into an Apache project. 
However, the relevant documentation 
http://apache.org/dev/apply-license.html 
sais that contributed files should have the copyright 
assigned to the Apache Software Foundation. 
I'll try and do that the next time. 
Could you change the copyright notices accordingly this time? 
 
> *) BooleanScorer2 extends NrMatchersScorer and nrMatchers() always returns 1.  
> Is there a reason for that? I think it should either only extend Scorer or 
> deliver the correct values. I opt for extending Scorer only. 
 
The reason is that a BooleanQuery can be scored by a few cooperating 
(nested) scorers, and that it should still be possible to compute the 
coordination factor from the number of matching scorers of the originally 
added clauses. 
 
By default nrMatchers() returns 1, and this is for the case when the scorer is 
given to the BooleanScorer2 as a scorer of an added clause. 
(At the moment these are wrapped in a NrMatchersScorer. (*)) 
The cooperating scorers implementing the boolean behaviour 
add these numbers for their subscorers to make it work in the same way 
as scoring a single BooleanQuery. 
The idea is is to either sum nrMatchers(), or to use nrMatchers() 
for the coordination factor in the score and return 1 for nrMatchers(). 
It might be worthwhile to add something like this in the javadocs. 
  
> *) All NrMatchersScorers except for BooleanScorer2 and ConjunctionScorer 
don't 
> use a similarity implementation. They compute raw scores and nrMatches. 
> ConjunctionScorer is a hybrid. It uses coord-factors and is is used as 
> NrMatchersScorer. This could lead to incorrect results with Similarity 
> implementations other than DefaultSimilarity. A ConjunctionScorer used as 
> NrMathesScorer should compute raw scores, if used as standard Scorer it  
> should use coord-factors. How can we achieve this in an elegant way? 
>  
> Christoph 
 
Your're right that ConjunctionScorer has a double role here: 
it can be used as a full replacement for BooleanScorer when all clauses  
are required, and it can also be used to score only the required 
clauses combined with ReqOptScorer or ReqExclScorer for the other  
clauses. 
 
The implementation could only fail when ConjunctionScorer 
provides a nrMatchers bigger than 1, and computes the coordination 
factor into it's score. The implementation prevents 
this by using a top level scorer that always returns 1 for nrMatchers, 
and uses nrMatchers() of it's subscorers for the coordination factor. 
 
This is somewhat tricky, so I hope I got all the details right. 
 
It also means that the changed ConjunctionScorer should not multiply 
a coordination factor into its score() value. I don't remember 
whether or not it does that, but it shouldn't. 
 
One way to solve this would be to use another name for the changed 
ConjunctionScorer, or to explicitly document that it should be 
wrapped in a scorer that returns 1 for nrMatchers() when implementing 
a full BooleanQuery. 
 
Regards, 
Paul Eschot. 
 
(*) In case nrMatchers() is added to Scorer, this wrapping would not 
be necessary, and it should be documented that it is expected that 
the scorers for the clauses implement their own coordination factor 
into their score and return 1 for nrMatchers(). 
There may be a better way to implement this 'decoupling' 
of the coordination factor from the cooperating scorers enterely within 
BooleanScorer2, for example by maintaining the 
number of matching subscorers in the top level scorer, invisible 
from the outside, and having all the cooperating scorers maintain 
this attribute of the top level scorer instead of their own. 
 
 

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message