lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
Date Mon, 02 May 2011 16:52:03 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless reopened LUCENE-3054:
----------------------------------------


Reopening so we can discuss things further...:

QuickSort is dangerous!  Yet, it's definitely faster than MergeSort
for some cases (~20% faster when sorting terms for writing segment, in
quick test I ran on Wikipedia content).

So the core issue is we should not use QS when there's a risk of any
ties, because in that case it can run really slowly or hit infinite
recursion.

And we (well, Otis; thank you!) found one such place today (where
MultiPhraseQuery sorts its terms) where we could have many ties and
thus run very slowly / hit stack overflow.

I appreciate the motivation for the "safety net", but, it makes me
nervous... because, say we had done this a few months back... then
Otis likely would not have reported the issue?  Ie, the
MultiPhraseQuery would run slowly... which could evade detection
(people may just think it's slow).

I prefer brittle fails over silent slowdowns because the brittle fail
gets your attention and you get a real fix in.  Silent slowdowns evade
detection.  Sort of like the difference between a virus and
spyware...

Also, what's preventing us from accidentally using QS somewhere in the
future, where we shouldn't?  What's going to catch us?

Robert's first patch would catch this and protect us going forward?

Or, maybe we could strengthen that approach and "assert cmp != 0"
inside QS (ie, no ties are allowed to be passed to QS)?

Though, using asserts only is risky, because it could be the
comparator may return 0, but it's just that none of our test cases
tickled it.

Maybe instead we could do this in a type-safe way: make a new
NoTiesComparator whose compare method can only return LESS_THAN or
GREATER_THAN?  And then QS would require NoTiesComparator.  Could that
work?


> SorterTemplate.quickSort stack overflows on broken comparators that produce only few
disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch,
LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken
comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message