Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 58405 invoked from network); 6 Apr 2010 10:27:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Apr 2010 10:27:56 -0000 Received: (qmail 58576 invoked by uid 500); 6 Apr 2010 10:27:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 58401 invoked by uid 500); 6 Apr 2010 10:27:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 58386 invoked by uid 99); 6 Apr 2010 10:27:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 10:27:54 +0000 X-ASF-Spam-Status: No, hits=-1226.3 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 10:27:53 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C9C1A234C4AD for ; Tue, 6 Apr 2010 10:27:33 +0000 (UTC) Message-ID: <1863417989.4691270549653825.JavaMail.jira@brutus.apache.org> Date: Tue, 6 Apr 2010 10:27:33 +0000 (UTC) From: "Robert Muir (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2369) Locale-based sort by field with low memory overhead In-Reply-To: <142973736.4201270547613607.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853838#action_12853838 ] Robert Muir commented on LUCENE-2369: ------------------------------------- Toke, I still think it would be better to use ICU collation keys here. for your danish text, the memory usage will still be smaller than trunk (as ICU collation keys as byte[] are smaller than java's internal utf-16 encoding). then you can do this sort at index-time... one step towards this is to switch collation in flex to use byte[] rather than encoding in char[] like it does today. > Locale-based sort by field with low memory overhead > --------------------------------------------------- > > Key: LUCENE-2369 > URL: https://issues.apache.org/jira/browse/LUCENE-2369 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Toke Eskildsen > Priority: Minor > > The current implementation of locale-based sort in Lucene uses the FieldCache which keeps all sort terms in memory. Beside the huge memory overhead, searching requires comparison of terms with collator.compare every time, making searches with millions of hits fairly expensive. > This proposed alternative implementation is to create a packed list of pre-sorted ordinals for the sort terms and a map from document-IDs to entries in the sorted ordinals list. This results in very low memory overhead and faster sorted searches, at the cost of increased startup-time. As the ordinals can be resolved to terms after the sorting has been performed, this approach supports fillFields=true. > This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 which contain previous discussions on the subject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org