Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AAD8B9FB3 for ; Mon, 11 Jun 2012 21:25:46 +0000 (UTC) Received: (qmail 52258 invoked by uid 500); 11 Jun 2012 21:25:43 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 52179 invoked by uid 500); 11 Jun 2012 21:25:43 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 52006 invoked by uid 99); 11 Jun 2012 21:25:43 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2012 21:25:43 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 802F5142866 for ; Mon, 11 Jun 2012 21:25:43 +0000 (UTC) Date: Mon, 11 Jun 2012 21:25:43 +0000 (UTC) From: "Adrien Grand (JIRA)" To: dev@lucene.apache.org Message-ID: <1132298821.4333.1339449943528.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1670182620.52124.1339146743454.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (LUCENE-4120) FST should use packed integer arrays MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293100#comment-13293100 ] Adrien Grand commented on LUCENE-4120: -------------------------------------- bq. It seems sort of odd to have the new .save method on ReaderImpl... can it be on Mutable/Impl instead, or, maybe FST does its own saving or something? My first intent was to add this method to {{Mutable}}. The problem is that {{nodeRefToAddress}} needs to be a reader since it may be instantiated through {{PackedInts.getReader}}, but it also might need to be serialized because of the {{save}} method. This is why I added this method to {{Reader}}. I can switch this method to {{Mutable}} but this means that it won't be possible to {{save}} a {{FST}} read from disk anymore (maybe not a problem?). Another solution could be to move the serialization logic to {{FST}} but this would require to expose some internals of the packed integer arrays to select the right format ({{PACKED}} or {{PACKED_SINGLE_BLOCK}} depending on whether the reader/mutable is an instance of {{Packed64SingleBLock}}) but I would really like to avoid this as long as possible. bq. In all the places we now pass random.nextFloat() for acceptableOverheadRatio (to FST.pack or MemoryPostingsFormat), shouldn't it be COMPACT .. FASTEST instead of 0.0 .. 1.0? 0..1 gives more chances to different implementations to be selected. {{FASTEST=7}} is only useful for {{bitsPerValue=1}} so that a {{Direct8}} is instantiated. If we used an uniformly distributed float between {{COMPACT=0}} and {{FASTEST=7}}, a {{Direct*}} implementation would be used more than 6/7 of the time when {{bitsPerValue>=4}}. For example, if {{bitsPerValue=15}}, a {{Direct16}} will be instantiated if {{acceptableOverheadRatio>=1/15=0.07}} and a {{Packed64}} otherwise. A lower upper bound for {{acceptableOverheadRatio}} makes the latter case more likely. bq. [kuromoji], [getWriterByFormat], [javadocs] Agreed, working on it. > FST should use packed integer arrays > ------------------------------------ > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org