Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D1D93067 for ; Mon, 2 May 2011 20:01:48 +0000 (UTC) Received: (qmail 1204 invoked by uid 500); 2 May 2011 20:01:46 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 1143 invoked by uid 500); 2 May 2011 20:01:46 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 1136 invoked by uid 99); 2 May 2011 20:01:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 20:01:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 20:01:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9F339BE29E for ; Mon, 2 May 2011 20:01:04 +0000 (UTC) Date: Mon, 2 May 2011 20:01:04 +0000 (UTC) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Message-ID: <2142481007.16043.1304366464648.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <585341490.12111.1304099223657.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027808#comment-13027808 ] Michael McCandless commented on LUCENE-3054: -------------------------------------------- So, there are two known improvements to our QS, to try to avoid the O(N^2) worst-case, both from Robert Sedgewick. First, it's better to select median of low/mid/high as the pivot (http://en.wikipedia.org/wiki/Quicksort#Choice_of_pivot). Second, we should handle "equal" values better (http://www.angelfire.com/pq/jamesbarbetti/articles/sorting/001_QuicksortIsBroken.htm#Duplicates). See also Lucy's nice QS impl: http://svn.apache.org/viewvc/incubator/lucy/trunk/core/Lucy/Util/SortUtils.c?revision=1098445&view=markup#l331 which I think addresses the above two issues, and goes even further (eq-to-pivot values are explicitly "moved to the middle" and then not recursed on). The thing is, fixing these will make our QS more "general", at the expense of some added cost for the cases we know work fine today (eg sorting terms before flushing a segment). Maybe we leave our QS as is (except, changing the 40 to be dynamic depending on input length), noting that you should not use it if your comparator does not break ties, and even if it does there are still risks because of potentially bad pivot selection? Or, maybe we remove QS always use MS? Yes, there's a hit to the sort when flushing the segment, but this is a tiny cost compared to the rest of segment flushing... Separately we can look into whether the tool timsort is faster for sorting terms for flush.... > SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays > -------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-3054 > URL: https://issues.apache.org/jira/browse/LUCENE-3054 > Project: Lucene - Java > Issue Type: Task > Affects Versions: 3.1 > Reporter: Robert Muir > Assignee: Uwe Schindler > Priority: Critical > Fix For: 3.1.1, 3.2, 4.0 > > Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch > > > Looking at Otis's sort problem on the mailing list, he said: > {noformat} > * looked for other places where this call is made - found it in > MultiPhraseQuery$MultiPhraseWeight and changed that call from > ArrayUtil.quickSort to ArrayUtil.mergeSort > * now we no longer see SorterTemplate.quickSort in deep recursion when we do a > thread dump > {noformat} > I thought this was interesting because PostingsAndFreq's comparator > looks like it needs a tiebreaker. > I think in our sorts we should add some asserts to try to catch some of these broken comparators. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org