Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 20063 invoked from network); 2 Nov 2005 04:52:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Nov 2005 04:52:54 -0000 Received: (qmail 58460 invoked by uid 500); 2 Nov 2005 04:52:51 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 58442 invoked by uid 500); 2 Nov 2005 04:52:51 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 58428 invoked by uid 99); 2 Nov 2005 04:52:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2005 20:52:51 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [12.154.210.214] (HELO rectangular.com) (12.154.210.214) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2005 20:52:46 -0800 Received: from [67.189.26.9] (helo=[10.0.1.2]) by rectangular.com with esmtpa (Exim 4.44) id 1EXAqI-000G7C-V8 for java-dev@lucene.apache.org; Tue, 01 Nov 2005 21:07:23 -0800 Mime-Version: 1.0 (Apple Message framework v734) In-Reply-To: <4367AB21.1030204@apache.org> References: <4367AB21.1030204@apache.org> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <778BFD4A-B58B-4041-91BD-32AFFA2FDDD7@rectangular.com> Content-Transfer-Encoding: 7bit From: Marvin Humphrey Subject: Re: bytecount as String and prefix length Date: Tue, 1 Nov 2005 20:52:28 -0800 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.734) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Nov 1, 2005, at 9:51 AM, Doug Cutting wrote: > Another approach might be to, instead of converting to UTF-8 to > strings right away, change things to convert lazily, if at all. > During index merging such conversion should never be needed. !! There ought to be some gains possible there, then. No predictions as to how much, though. > You needn't do this systematically throughout Lucene, but only > where it makes a big difference. For example, if you could avoid > strings in SegmentMerger.mergeTermInfos() it might make a huge > difference. This might be as simple as changing SegmentMergeInfo > to use a TermBuffer instead of a Term. Does that make sense? Abundant sense. I'm not as familiar with SegmentMerger as I am with other parts of the org.apache.lucene.index package, because I haven't ported it yet. But conceptually I understand exactly why this should require fewer resources. I'll take a swing at SegmentMerger and submit a comprehensive diff. Thanks for the suggestions, Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org