lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: DisjunctionMaxQuery and scoring
Date Thu, 19 Apr 2012 21:15:59 GMT
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies <bimargulies@gmail.com> wrote:
>> On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies <bimargulies@gmail.com>
wrote:
>>>> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>>> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies <bimargulies@gmail.com>
wrote:
>>>>>> I am trying to solve a problem using DisjunctionMaxQuery.
>>>>>>
>>>>>>
>>>>>> Consider a query like:
>>>>>>
>>>>>> a:b OR c:d OR e:f OR ...
>>>>>> name:richard OR name:dick OR name:dickie OR name:rich ...
>>>>>>
>>>>>> At most, one of the richard names matches. So the match score gets
>>>>>> dragged down by the long list of things that don't match, as the
list
>>>>>> can get quite long.
>>>>>>
>>>>>> It seemed to me, upon reading the documentation, that I could cure
>>>>>> this problem by creating a query tree that used DisjunctionMaxQuery
>>>>>> around all those nicknames. However, when I built a boolean query
that
>>>>>> had, as a clause, a DisjunctionMaxQuery in the place of a pile of
>>>>>> these individual Term queries, the score and the explanation did
not
>>>>>> change at all -- in particular, the coord term shows the same number
>>>>>> of total terms. So it looks as if the children of the disjunction
>>>>>> still count.
>>>>>>
>>>>>> Is there a way to control that term? Or a better way to express this?
>>>>>> Thinking SQL for a moment, what I'm trying to express is
>>>>>>
>>>>>>   name IN (richard, dick, dickie, rich)
>>>>>>
>>>>>
>>>>> I think you just want to disable coord() here? You can do this for
>>>>> that particular boolean query by passing true to the ctor:
>>>>>
>>>>>  public BooleanQuery(boolean disableCoord)
>>>>
>>>> Rob,
>>>>
>>>> How do nested queries work with respect to this? If I build a boolean
>>>> query one of whose clauses is a BooleanQuery with coord turned off,
>>>> does just the nested query insides get left out of 'coord'?
>>>>
>>>> If so, then your answer certainly seems to be what the doctor ordered.
>>>>
>>>
>>> it applies only to that query itself. So if this BQ is a clause to
>>> another BQ that has coord enabled,
>>> that would not change the top-level BQ's coord.
>>>
>>> Note: if you don't want coord at all, then you can also plug in a
>>> Similarity that returns 1,
>>> or pick another Similarity like BM25: in trunk only the vector space
>>> impl even does anything for coord()....
>>
>> Robert, I'm sorry that my density is approaching lead. My problem is
>> that I want coord, but I want to control which terms are counted and
>> which are not. I suppose I can accomplish this with my own scorer. My
>> hope was that there was a way to express "This group of terms counts
>> as one for coord".
>
> So just structure your boolean query appropriately?
>
> BQ1(coord=true)
>  BQ2(coord=false): 25 terms
>  BQ3(coord=false): 87 terms
>
> BQ1's coord is based on how many subscorers match (out of 2, BQ2 and
> BQ3). If both match its 2/2 otherwise 1/2.
>
> But in this example BQ2 and BQ3 disable coord themselves, hiding the
> fact they accept 25 and 87 terms respectively and appearing as a
> single sub for coord().
>
> Does this make sense? you can extend this idea to control this however
> you want by structuring the BQ appropriately so your BQ's with
> "synonyms" have coord=0

Robert,

This makes perfect sense, it is what I thought you meant to begin
with. I tried it and thought that it did not work. Or, perhaps, I am
misreading the 'explain' output. Or, more likely, I goofed altogether.
I'll go back and recheck my results and post some explain output if I
can't find my mistake.

--benson




>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message