Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B8BF9B90 for ; Fri, 30 Mar 2012 20:43:51 +0000 (UTC) Received: (qmail 46295 invoked by uid 500); 30 Mar 2012 20:43:50 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 46229 invoked by uid 500); 30 Mar 2012 20:43:50 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 46221 invoked by uid 99); 30 Mar 2012 20:43:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Mar 2012 20:43:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Mar 2012 20:43:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5B02534F6E8 for ; Fri, 30 Mar 2012 20:43:26 +0000 (UTC) Date: Fri, 30 Mar 2012 20:43:26 +0000 (UTC) From: "Sean Bridges (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1312173771.39834.1333140206374.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1081579439.29777.1332964408455.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3932) Improve load time of .tii files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242721#comment-13242721 ] Sean Bridges commented on LUCENE-3932: -------------------------------------- {quote}Seems like if we made a direct "decode tii file and write in-memory format" (instead of going through SegmentTermEnum), we could get some of this back. The vLongs unfortunately need to be decoded/re-encoded because they are deltas in the file but absolutes in memory. But, eg the vInt docFreq could be a "copyVInt" method instead of readVInt then writeVInt, which should save a bit.{quote} Is the space savings of delta encoding worth the processing time? You could write the .tii file to disk such that on open you could read it straight into a byte[] > Improve load time of .tii files > ------------------------------- > > Key: LUCENE-3932 > URL: https://issues.apache.org/jira/browse/LUCENE-3932 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 3.5 > Environment: Linux > Reporter: Sean Bridges > Attachments: LUCENE-3932.trunk.patch, perf.csv > > > We have a large 50 gig index which is optimized as one segment, with a 66 MEG .tii file. This index has no norms, and no field cache. > It takes about 5 seconds to load this index, profiling reveals that 60% of the time is spent in GrowableWriter.set(index, value), and most of time in set(...) is spent resizing PackedInts.Mutatable current. > In the constructor for TermInfosReaderIndex, you initialize the writer with the line, > {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, false);{quote} > For our index using four as the bit estimate results in 27 resizes. > The last value in indexToTerms is going to be ~ tiiFileLength, and if instead you use, > {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / Math.log10(2)); > GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, false);{quote} > Load time improves to ~ 2 seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org