lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: DisjunctionMaxQuery and scoring
Date Thu, 19 Apr 2012 21:10:00 GMT
On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies <bimargulies@gmail.com> wrote:
> On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir <rcmuir@gmail.com> wrote:
>> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies <bimargulies@gmail.com> wrote:
>>> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies <bimargulies@gmail.com>
wrote:
>>>>> I am trying to solve a problem using DisjunctionMaxQuery.
>>>>>
>>>>>
>>>>> Consider a query like:
>>>>>
>>>>> a:b OR c:d OR e:f OR ...
>>>>> name:richard OR name:dick OR name:dickie OR name:rich ...
>>>>>
>>>>> At most, one of the richard names matches. So the match score gets
>>>>> dragged down by the long list of things that don't match, as the list
>>>>> can get quite long.
>>>>>
>>>>> It seemed to me, upon reading the documentation, that I could cure
>>>>> this problem by creating a query tree that used DisjunctionMaxQuery
>>>>> around all those nicknames. However, when I built a boolean query that
>>>>> had, as a clause, a DisjunctionMaxQuery in the place of a pile of
>>>>> these individual Term queries, the score and the explanation did not
>>>>> change at all -- in particular, the coord term shows the same number
>>>>> of total terms. So it looks as if the children of the disjunction
>>>>> still count.
>>>>>
>>>>> Is there a way to control that term? Or a better way to express this?
>>>>> Thinking SQL for a moment, what I'm trying to express is
>>>>>
>>>>>   name IN (richard, dick, dickie, rich)
>>>>>
>>>>
>>>> I think you just want to disable coord() here? You can do this for
>>>> that particular boolean query by passing true to the ctor:
>>>>
>>>>  public BooleanQuery(boolean disableCoord)
>>>
>>> Rob,
>>>
>>> How do nested queries work with respect to this? If I build a boolean
>>> query one of whose clauses is a BooleanQuery with coord turned off,
>>> does just the nested query insides get left out of 'coord'?
>>>
>>> If so, then your answer certainly seems to be what the doctor ordered.
>>>
>>
>> it applies only to that query itself. So if this BQ is a clause to
>> another BQ that has coord enabled,
>> that would not change the top-level BQ's coord.
>>
>> Note: if you don't want coord at all, then you can also plug in a
>> Similarity that returns 1,
>> or pick another Similarity like BM25: in trunk only the vector space
>> impl even does anything for coord()....
>
> Robert, I'm sorry that my density is approaching lead. My problem is
> that I want coord, but I want to control which terms are counted and
> which are not. I suppose I can accomplish this with my own scorer. My
> hope was that there was a way to express "This group of terms counts
> as one for coord".

So just structure your boolean query appropriately?

BQ1(coord=true)
  BQ2(coord=false): 25 terms
  BQ3(coord=false): 87 terms

BQ1's coord is based on how many subscorers match (out of 2, BQ2 and
BQ3). If both match its 2/2 otherwise 1/2.

But in this example BQ2 and BQ3 disable coord themselves, hiding the
fact they accept 25 and 87 terms respectively and appearing as a
single sub for coord().

Does this make sense? you can extend this idea to control this however
you want by structuring the BQ appropriately so your BQ's with
"synonyms" have coord=0

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message