lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3054) add assert to sorts catch broken comparators in tests
Date Mon, 02 May 2011 13:04:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027639#comment-13027639
] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

I investigated what happens here:

The problem is indeed quickSort, but not undernormal circumstances. The problem with quickSort
(just google for stack overflow and quicksort) is that it only works fine for arrays with
many values. Once you only have few distinct values and a large array, depending on the oreder
it may happen that it splits into two subarrays for next iteration, where one is very large
and the other only contains few items.

Attached is a patch, that shows the problem. It almost every time stack overflows. Also quicksort
is very *slow* for this case.

This is exactly what happens on PhraseQuery: we only have very few distinct items and possibly
a very huge array. To fix this, we should change PhraseQuery to use mergeSort instead. Mergesort
is also much faster in this case, as it always splits the array in the center. So the number
of iterations is limited.

For TermsHash/BytesRefHash its mostly also not a problem, as the values (the terms are 100%
distict, as only the hash is sorted).

But there may still be the slight chance this messes up. I propose to change SorterTemplate
to fall back to mergeSort once it checks that number of iterations grows e.g. > 20 (have
to test a little bit).

I will change that issue to higher priority and we also need to backport to 3.1.

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken
comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message