lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4873) corner case in MinShouldMatchSumScorer when there are many terms
Date Sun, 24 Mar 2013 16:39:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612143#comment-13612143
] 

Robert Muir commented on LUCENE-4873:
-------------------------------------

Thanks Stefan:
{quote}
please remove these expensive operation calls if you think that assertions should be fast
because many users run with assertions enabled and could be irritated.
{quote}

good call: I disabled the expensive ones at least for now. I'll commit this for now, but I
think as a next step if we can refactor, we can instead unit test the utility heap methods
directly and feel a lot better.

{quote}
This just exemplifies that one shouldn't re-implement basic data structures in each and every
class.
Would it make sense to add heap operations to e.g. ArrayUtil and refactor the codebase? Or
is it known that this would mean prohibitive performance impact?
{quote}

Yes: I think we should do this. This was my original motivation for having a base class between
DisjunctionSum and DisjunctionMax scorers: but this sounds like it might be a better way to
do it. We can just benchmark that it doesnt have a performance impact.

                
> corner case in MinShouldMatchSumScorer when there are many terms
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4873
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/query/scoring
>    Affects Versions: 4.3
>            Reporter: Robert Muir
>         Attachments: LUCENE-4873.patch
>
>
> I think this bug is some extreme corner case...
> This test currently only uses up to 9 terms. By increasing it to 26 and blasting the
test, I was able to uncover a bug.
> Here's the seed: ant test  -Dtestcase=TestMinShouldMatch2 -Dtests.method=testNextAllTerms
-Dtests.seed=E0334C37E6E190D8 -Dtests.slow=true -Dtests.locale=pl_PL -Dtests.timezone=Asia/Thimphu
-Dtests.file.encoding=US-ASCII
> Here's the patch to make the test use 26 terms.
> {noformat}
> Index: lucene/core/src/test/org/apache/lucene/search/TestMinShouldMatch2.java
> ===================================================================
> --- lucene/core/src/test/org/apache/lucene/search/TestMinShouldMatch2.java	(revision
1459937)
> +++ lucene/core/src/test/org/apache/lucene/search/TestMinShouldMatch2.java	(working copy)
> @@ -56,7 +56,7 @@
>    static final String alwaysTerms[] = { "a" };
>    static final String commonTerms[] = { "b", "c", "d" };
>    static final String mediumTerms[] = { "e", "f", "g" };
> -  static final String rareTerms[]   = { "h", "i", "j" };
> +  static final String rareTerms[]   = { "h", "i", "j", "k", "l", "m", "n", "o", "p",
"q", "r", "s", "t", "u", "v", "w", "x", "y", "z" };
>    
>    @Override
>    public void setUp() throws Exception {
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message