From lucene-dev-return-4800-apmail-jakarta-lucene-dev-archive=jakarta.apache.org@jakarta.apache.org Thu Dec 04 18:29:06 2003 Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 16790 invoked from network); 4 Dec 2003 18:29:06 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 4 Dec 2003 18:29:06 -0000 Received: (qmail 31083 invoked by uid 500); 4 Dec 2003 18:28:56 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 31061 invoked by uid 500); 4 Dec 2003 18:28:55 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 31044 invoked from network); 4 Dec 2003 18:28:55 -0000 Received: from unknown (HELO rwcrmhc12.comcast.net) (216.148.227.85) by daedalus.apache.org with SMTP; 4 Dec 2003 18:28:55 -0000 Received: from lucene.com (c-24-5-145-151.client.comcast.net[24.5.145.151]) by comcast.net (rwcrmhc12) with SMTP id <2003120418285901400dmitje>; Thu, 4 Dec 2003 18:28:59 +0000 Message-ID: <3FCF7CEA.9010204@lucene.com> Date: Thu, 04 Dec 2003 10:28:58 -0800 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: suggestion for a CustomDirectory References: <001f01c3ba75$417d3350$690010ac@teck> In-Reply-To: <001f01c3ba75$417d3350$690010ac@teck> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Julien Nioche wrote: > However in most cases the > application would be faster because : > - tree access to the Term (this is only the case for the Terms in the .tii) > - no need to create up to 127 temporary Term objects (with creation of > Strings and so on....) > - limit garbage collecting The .tii is already read into memory when the index is opened. So the only savings would be the creation of (on average) 64 temporary Term objects per query. Do you have any evidence that this is a substantial part of the computation? I'd be surprised if it was. To find out, you could write a program which compares the time it takes to call docFreq() on a set of terms (allocating the 64 temporary Terms) to what it takes to perform queries (doing the rest of the work). I'll bet that the first is substantially faster: most of the work of executing a query is processing the .frq and .prx files. These are bigger than the RAM on your machine, and so cannot be cached. Thus you'll always be doing some disk i/o, which will likely dominate real performance. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org