lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
Date Wed, 04 May 2011 19:00:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028895#comment-13028895
] 

Doron Cohen commented on LUCENE-3068:
-------------------------------------

bq. specifically when the doc itself has tokens at the same position.

I am not convinced yet that there is a bug here - I think the code does allow this? 

There is another assumption in the code, that any two different PPs are in different TPs -
which underlines the assumption that originally each PP differs in position, This seems a
valid assumption, because QP will create MFQ if there are two terms in the (phrase) query
with same position. 

bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery
and let it handle the same positions...? Is there any downside to that?

I think this is the correct behavior - in particular this will be the query that a QP will
create. The only way to create a PQ (not MPQ) for PPs in same positions is to create it manually.
But why would anyone do that? And they did, wouldn't such a rewrite be a surprise to them?

A patch to follow with a revised version of this test - one that uses the QP. In this patch
the QP indeed creates an MFQ, and I am yet unable to make it fail. Still trying.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3068
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3068
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.3, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message