lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
Date Mon, 02 May 2011 20:01:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027808#comment-13027808
] 

Michael McCandless commented on LUCENE-3054:
--------------------------------------------

So, there are two known improvements to our QS, to try to avoid the O(N^2)
worst-case, both from Robert Sedgewick.

First, it's better to select median of low/mid/high as the pivot
(http://en.wikipedia.org/wiki/Quicksort#Choice_of_pivot).  Second, we
should handle "equal" values better
(http://www.angelfire.com/pq/jamesbarbetti/articles/sorting/001_QuicksortIsBroken.htm#Duplicates).

See also Lucy's nice QS impl:

  http://svn.apache.org/viewvc/incubator/lucy/trunk/core/Lucy/Util/SortUtils.c?revision=1098445&view=markup#l331

which I think addresses the above two issues, and goes even further
(eq-to-pivot values are explicitly "moved to the middle" and then not
recursed on).

The thing is, fixing these will make our QS more "general", at the
expense of some added cost for the cases we know work fine today (eg
sorting terms before flushing a segment).

Maybe we leave our QS as is (except, changing the 40 to be dynamic
depending on input length), noting that you should not use it if your
comparator does not break ties, and even if it does there are still
risks because of potentially bad pivot selection?

Or, maybe we remove QS always use MS?  Yes, there's a hit to the sort
when flushing the segment, but this is a tiny cost compared to the
rest of segment flushing...

Separately we can look into whether the tool timsort is faster for
sorting terms for flush....

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few
disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch,
LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken
comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message