Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 171BE79A3 for ; Wed, 7 Dec 2011 17:17:05 +0000 (UTC) Received: (qmail 7193 invoked by uid 500); 7 Dec 2011 17:17:03 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 7154 invoked by uid 500); 7 Dec 2011 17:17:03 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 7147 invoked by uid 99); 7 Dec 2011 17:17:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 17:17:03 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 17:17:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 58491104C48 for ; Wed, 7 Dec 2011 17:16:40 +0000 (UTC) Date: Wed, 7 Dec 2011 17:16:40 +0000 (UTC) From: "James Dyer (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <122017957.49939.1323278200363.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1061380395.14003.1310216716580.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164531#comment-13164531 ] James Dyer commented on LUCENE-3298: ------------------------------------ Carlos, I'm not sure how much help this is, but you might be able to eke a little bit of performance if you can tighten RewritablePagedBytes.copyBytes(). You'll note it currently moves the From-Bytes into a temp array then writes that back to the fst an the To-Bytes location. Note also, the one place this gets called, it used to be a simple "System.ArrayCopy". So if you can make it copy in-place that might claw back the performance loss a little. Beyond this, a different pair of eyes might find more ways to optimize. In the end though you will likely never make it perform quite as well as the simple array. Also, it sounds as if you've maybe done work to sync this with the current trunk. If so, would you mind uploading the updated patch? Also if you end up using this, be sure to test thoroughly. I implemented this one just to gain a little familiarity with the code and I do not claim any sort of expertise in this area, so beware! But all of the regular unit tests did pass for me. I was meaning to try to run test2bpostings against this but wasn't able to get it set up. If I remember this issue came up originally because someone wanted to run test2bpostings with memorycodec and it was going passed the limit. > FST has hard limit max size of 2.1 GB > ------------------------------------- > > Key: LUCENE-3298 > URL: https://issues.apache.org/jira/browse/LUCENE-3298 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs > Reporter: Michael McCandless > Priority: Minor > Attachments: LUCENE-3298.patch > > > The FST uses a single contiguous byte[] under the hood, which in java is indexed by int so we cannot grow this over Integer.MAX_VALUE. It also internally encodes references to this array as vInt. > We could switch this to a paged byte[] and make the far larger. > But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org