Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 36706 invoked from network); 29 Jun 2004 13:54:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 29 Jun 2004 13:54:24 -0000 Received: (qmail 97701 invoked by uid 500); 29 Jun 2004 13:54:16 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 97609 invoked by uid 500); 29 Jun 2004 13:54:14 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 97595 invoked by uid 99); 29 Jun 2004 13:54:14 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received: from [216.136.173.245] (HELO web12708.mail.yahoo.com) (216.136.173.245) by apache.org (qpsmtpd/0.27.1) with SMTP; Tue, 29 Jun 2004 06:54:10 -0700 Message-ID: <20040629135404.5225.qmail@web12708.mail.yahoo.com> Received: from [211.95.204.101] by web12708.mail.yahoo.com via HTTP; Tue, 29 Jun 2004 06:54:04 PDT Date: Tue, 29 Jun 2004 06:54:04 -0700 (PDT) From: Otis Gospodnetic Subject: Re: return value of terms() To: Lucene Users List In-Reply-To: <143088710.1088516807953.JavaMail.dummy@smb-tec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I see. A search for that Term still gets Hits. I don't think this should be happening. Maybe Erik or one of the other Lucene developers will have some ideas. Otis --- Lars Martin wrote: > -----Urspr�ngliche Nachricht----- > Von: Otis Gospodnetic > Gesendet am: 29. Jun 2004, 13:46:41 > > > I would try using the delete(Term) method, to ensure all documents > > with the given Term are removed: > > > > IndexReader indexReader = IndexReader.open( indexPath ); > > indexReader.delete( new Term( "body", "YourTermHere" ) ); > > indexReader.close(); > > ... > > IndexReader indexReader = IndexReader.open( indexPath ); > > TermEnum enum = indexReader.terms( new Term( "body", "" ) ); > > > > Something like that... > > > Thanks for your reply. > > I do not want to delete documents by terms. All my indexed documents > are referenced by id, so I have to use delete( id ). What makes me > insecure is the fact, that there are still terms in index from > documents > which are already deleted. This would mean that TermEnum is a > continously > growing beast. No problem when I query such a term, because no > document > is matching the query. But when I do computation based on indexed > terms > I heavily depend on the number of terms in the current TermEnum. Even > if such terms don't impact my computation - because the freq is > always > 0 - it would have an impact on my runtime behavior and complexity. > > Regards, Lars > > > > Otis > > > > --- Lars Martin wrote: > > > Hi. > > > > > > Is it the normal behavior that IndexReader.terms( Term t ) still > > > returns Terms which are not any longer to be found in the index, > > > e.g. after removing the document containing these Terms? > > > I've removed nearly all documents from index but the terms() > method > > > is still returning all terms. > > > > > > IndexReader indexReader = IndexReader.open( indexPath ); > > > indexReader.delete( docId ); > > > indexReader.close(); > > > ... > > > IndexReader indexReader = IndexReader.open( indexPath ); > > > TermEnum enum = indexReader.terms( new Term( "body", "" ) ); > > > > > > Any hints? Regards, Lars --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org