From lucene-dev-return-6004-apmail-jakarta-lucene-dev-archive=jakarta.apache.org@jakarta.apache.org Mon Apr 05 20:44:41 2004 Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 30627 invoked from network); 5 Apr 2004 20:44:41 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 5 Apr 2004 20:44:41 -0000 Received: (qmail 12543 invoked by uid 500); 5 Apr 2004 20:44:04 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 12459 invoked by uid 500); 5 Apr 2004 20:44:03 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 12406 invoked from network); 5 Apr 2004 20:44:03 -0000 Received: from unknown (HELO sccrmhc12.comcast.net) (204.127.202.56) by daedalus.apache.org with SMTP; 5 Apr 2004 20:44:02 -0000 Received: from apache.org (c-24-5-145-151.client.comcast.net[24.5.145.151]) by comcast.net (sccrmhc12) with ESMTP id <20040405204408012002ldkle>; Mon, 5 Apr 2004 20:44:08 +0000 Message-ID: <4071C4C0.30907@apache.org> Date: Mon, 05 Apr 2004 13:42:40 -0700 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: TermDocs.skipTo() References: <406DA9EE.6090600@detego-software.de> In-Reply-To: <406DA9EE.6090600@detego-software.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Christoph Goller wrote: > Problem: In TermInfosReader index (every 128th term) skipOffsets are not > stored! Due to documentation getIndexOffset returns the offset of the > greatest > index entry which is less than term. I believe this is not true it may > deliver the term itself! If we seek for a term that is in the index, this > term and its termInfo will not be read from the enumerator by scanEnum and > consequently no skipOffset will be found, even if present. This could lead > to serious problems when skipTo is used, couldnīt it? Yes, this does look like a problem. > Possible Solution: Store skipOffset in *.tii too. I think that's a good solution. We should change TermInfosWriter.FORMAT from -1 to -2 and then use that to keep SegmentTermEnum.next() back-compatible, since folks may have created indexes with 1.4RC2. The simplest way to do this would be to disable skipTo() when TermInfosWriter.FORMAT is -1, by setting skipInterval to Integer.MAX_VALUE, as is done for 1.3 indexes. Shall I do this, or would you like to? Thanks so much for finding things like this! Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org