Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 85876 invoked from network); 26 Feb 2004 18:25:05 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 26 Feb 2004 18:25:05 -0000 Received: (qmail 12528 invoked by uid 500); 26 Feb 2004 18:24:52 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 12506 invoked by uid 500); 26 Feb 2004 18:24:51 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 12486 invoked from network); 26 Feb 2004 18:24:51 -0000 Received: from unknown (HELO rwcrmhc11.comcast.net) (204.127.198.35) by daedalus.apache.org with SMTP; 26 Feb 2004 18:24:51 -0000 Received: from apache.org (c-24-5-145-151.client.comcast.net[24.5.145.151]) by comcast.net (rwcrmhc11) with ESMTP id <2004022618245501300ogkhke>; Thu, 26 Feb 2004 18:24:55 +0000 Message-ID: <403E39F4.20403@apache.org> Date: Thu, 26 Feb 2004 10:24:52 -0800 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Iterating TermEnum backwards References: <00fd01c3fc24$d194adb0$ff09050a@dscp02272> <403D9567.5010705@ctx.com.au> In-Reply-To: <403D9567.5010705@ctx.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Matt Quail wrote: > Is there any way to iterate through a TermEnum backwards? Okay, I know > that there isn't a way to do this via the TermEnum class, but is it > "implementable" on top of the underlying Lucene datastore? Not really. The best you can do is skip back to the previous "indexed" term in TermInfosReader.indexTerms, and scan forward from there. You could try adding a method to that class like: final synchronized void seekBefore(Term term) throws IOException { int offset = getIndexOffset(term); seekEnum(offset > 0 ? offset - 1 : offset); } Then you'd need to add stuff to MultiReader, SegmentReader and IndexReader, to take advantage of this. It could get a little tricky, but it is possible. I'm not convinced this is your best route. > My particular problem is this: > > I have an index of documents, each document has a "date" field (I'm > using DateField). Most documents have a different date, so the number of > unique dates is close to the number of documents. Are you adding documents in date order? If so, then you could look at the date of the document numbered maxDoc() - N and scan forward from there. To be safe, you could start at maxDoc() - N*2 or something. > I want to find the top N most recent dates, but I don't want to have to > iterate through ALL of them first. NB: With DateField, the earlier dates > are lexocographically smaller. (I also want to find the most recent N > less than some date D). > > I know I could "invert" my dates (something like MAX_LONG - date) to get > the REVERSE order, but I want to be able to do "least recent" and "most > recent". Why not have two date fields, one inverted and one not? > PS: my current solution is to do a binary search between MIN and MAX, > halving my search space until I find close to N matching documents. That doesn't sound like a bad solution. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org