Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D17545EC for ; Sun, 19 Jun 2011 13:52:10 +0000 (UTC) Received: (qmail 87542 invoked by uid 500); 19 Jun 2011 13:52:08 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 87478 invoked by uid 500); 19 Jun 2011 13:52:08 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 87471 invoked by uid 99); 19 Jun 2011 13:52:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jun 2011 13:52:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jun 2011 13:52:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 637B84216AB for ; Sun, 19 Jun 2011 13:51:47 +0000 (UTC) Date: Sun, 19 Jun 2011 13:51:47 +0000 (UTC) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Message-ID: <976129156.19189.1308491507404.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <887169682.10002.1308207287288.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3206) FST package API refactoring MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051684#comment-13051684 ] Michael McCandless commented on LUCENE-3206: -------------------------------------------- OK, these results make sense! UTF32 (vInt labels) is more compact than UTF8, if you disable array'd arcs. These wiki terms are from the en export right? So the differences are due to the smallish number of random terms that are not English... it should be more extreme if we used non-English content. I wonder how lookup time would compare... I think UTF32 should be faster? And yes for truly binary terms (eg collated fields, and maybe eventually numeric fields but not yet because they still avoid the 8th bit I think) I think we want to keep BYTE1. We need some good use cases of FSTs during analysis... there we are free to make the alphabet non-byte (vs the index, where terms are a BytesRef). > FST package API refactoring > --------------------------- > > Key: LUCENE-3206 > URL: https://issues.apache.org/jira/browse/LUCENE-3206 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs > Affects Versions: 3.2 > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3206.patch > > > The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org