lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: DisjunctionMaxQuery and scoring
Date Thu, 19 Apr 2012 21:05:44 GMT
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies <bimargulies@gmail.com> wrote:
>> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies <bimargulies@gmail.com>
wrote:
>>>> I am trying to solve a problem using DisjunctionMaxQuery.
>>>>
>>>>
>>>> Consider a query like:
>>>>
>>>> a:b OR c:d OR e:f OR ...
>>>> name:richard OR name:dick OR name:dickie OR name:rich ...
>>>>
>>>> At most, one of the richard names matches. So the match score gets
>>>> dragged down by the long list of things that don't match, as the list
>>>> can get quite long.
>>>>
>>>> It seemed to me, upon reading the documentation, that I could cure
>>>> this problem by creating a query tree that used DisjunctionMaxQuery
>>>> around all those nicknames. However, when I built a boolean query that
>>>> had, as a clause, a DisjunctionMaxQuery in the place of a pile of
>>>> these individual Term queries, the score and the explanation did not
>>>> change at all -- in particular, the coord term shows the same number
>>>> of total terms. So it looks as if the children of the disjunction
>>>> still count.
>>>>
>>>> Is there a way to control that term? Or a better way to express this?
>>>> Thinking SQL for a moment, what I'm trying to express is
>>>>
>>>>   name IN (richard, dick, dickie, rich)
>>>>
>>>
>>> I think you just want to disable coord() here? You can do this for
>>> that particular boolean query by passing true to the ctor:
>>>
>>>  public BooleanQuery(boolean disableCoord)
>>
>> Rob,
>>
>> How do nested queries work with respect to this? If I build a boolean
>> query one of whose clauses is a BooleanQuery with coord turned off,
>> does just the nested query insides get left out of 'coord'?
>>
>> If so, then your answer certainly seems to be what the doctor ordered.
>>
>
> it applies only to that query itself. So if this BQ is a clause to
> another BQ that has coord enabled,
> that would not change the top-level BQ's coord.
>
> Note: if you don't want coord at all, then you can also plug in a
> Similarity that returns 1,
> or pick another Similarity like BM25: in trunk only the vector space
> impl even does anything for coord()....

Robert, I'm sorry that my density is approaching lead. My problem is
that I want coord, but I want to control which terms are counted and
which are not. I suppose I can accomplish this with my own scorer. My
hope was that there was a way to express "This group of terms counts
as one for coord".

In other words, for a subset of fields in the query, I want to scale
the entire score by the fraction of them that match.

Another way to think about this, which might be no use at all, is to
wonder: is there a way to charge a score penalty for failure to match
a particular query term? That would, from another direction, address
the underlying effect I'm trying to get.



>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message