From java-user-return-36632-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Oct 15 06:40:15 2008 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76748 invoked from network); 15 Oct 2008 06:40:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Oct 2008 06:40:14 -0000 Received: (qmail 58004 invoked by uid 500); 15 Oct 2008 06:40:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 57977 invoked by uid 500); 15 Oct 2008 06:40:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 57953 invoked by uid 99); 15 Oct 2008 06:40:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2008 23:40:08 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [217.12.10.212] (HELO web26001.mail.ukl.yahoo.com) (217.12.10.212) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 15 Oct 2008 06:39:01 +0000 Received: (qmail 6830 invoked by uid 60001); 15 Oct 2008 06:39:35 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=qqgdGGdVFPeL14J+b1OGRVVXeNrBuT/SGRHJ2FmFLp92BolTGpRWWekZ8DMvqit+mOTSovlD9+0Bzu+K1EwwO+FABMuA6pg2idF6eYwRobW433yl61112mbUl04883n8dXXHX7/LtJC+kN8GMiqLsFAt5b7ULjgowAky64MeKjo=; X-YMail-OSG: Q7uS9HEVM1mngF5LkYQv3gDep5l_ByFphf6b3tT.Jqsus9K1Su_UvaaSsyVGDzXvwfp5.3XenER.iEX0AfiMbEMOCf9LdRQnoOzFWGz5WeLnZ6uPArqaa62E6V8VuHombiG_XeqsLgpL5tF9WBHnZytBfsYe2Dsg80BftPo1bTLECfWzewQ- Received: from [87.248.121.30] by web26001.mail.ukl.yahoo.com via HTTP; Wed, 15 Oct 2008 06:39:35 GMT X-Mailer: YahooMailWebService/0.7.247.3 Date: Wed, 15 Oct 2008 06:39:35 +0000 (GMT) From: Mark Harwood Subject: Re: Question regarding sorting and memory consumption in lucene To: "java-user@lucene.apache.org" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID: <599306.4376.qm@web26001.mail.ukl.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org Yes, StringIndex's public fields make life awkward. Re initialization - I = did think you could try use arrays of byte arrays. First 256 terms can be a= ddressed using just one byte array, on encountering a 257th term an extra b= yte array is allocated. References to terms then require indexing into 2 by= te arrays and bit shifting the 2nd byte to produce a comibined short which = can address up to 65k terms held in a term pool. =0A=0AWhen sorting, a fast= comparison of 2 values can avoid always indexing into all byte arrays and = shifting to produce a number. Simply comparing entries from the most signif= icant byte array first can reveal a difference in order, if equal then comp= aring bytes from the next most significant byte array is required and so o= n. =0A=0ANot sure how this would perform compared to simply upgrading whole= byte arrays to shorts to ints as you go. =0A=0ACheers,=0AMark=0A=0AOn 15 O= ct 2008, at 00:56, Chris Hostetter wrote:=0A=0A= =0A=0A: Actually looking at this a little deeper maybe Lucene could/should = =0A: automatically be doing this "short" optimisation here?=0A=0AAt the mom= ent it can't, the array's in StringIndex are public.=0A=0AThe other thing t= hat would be a bit tricky is the initialization ... i =0Acan't think of any= easy way to know in advance how many terms there are =0Abefore iterating o= ver all the terms, so you'd have to assume one and then =0Aif you're wrong = copy to the other -- not sure how expensive thta copy =0Awould be.=0A=0AIt'= s a little more feasible for custom clients to do when they know in =0Aadva= nce how many terms they've got -- but some of the existing =0AFieldCacheImp= l code could probably be refactoredto make it easier on =0Apeople.=0A=0A=0A= =0A-Hoss=0A=0A=0A----------------------------------------------------------= -----------=0ATo unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.o= rg=0AFor additional commands, e-mail: java-user-help@lucene.apache.org=0A= =0A=0A=0A=0A --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org