Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 54984 invoked from network); 30 Jun 2004 09:37:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 30 Jun 2004 09:37:05 -0000 Received: (qmail 76842 invoked by uid 500); 30 Jun 2004 09:37:16 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 76780 invoked by uid 500); 30 Jun 2004 09:37:14 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 76748 invoked by uid 99); 30 Jun 2004 09:37:14 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received: from [213.61.178.43] (HELO mail.tanto.de) (213.61.178.43) by apache.org (qpsmtpd/0.27.1) with ESMTP; Wed, 30 Jun 2004 02:37:12 -0700 Received: from localhost (localhost [127.0.0.1]) by mail.tanto.de (Postfix) with ESMTP id 4C8FE23B92 for ; Wed, 30 Jun 2004 11:36:32 +0200 (CEST) Received: from mail.tanto.de ([127.0.0.1]) by localhost (mail.tanto.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 05149-10 for ; Wed, 30 Jun 2004 11:36:32 +0200 (CEST) Received: from tucholsky.office.tanto.de (morus.xipolis.net [10.0.1.4]) by mail.tanto.de (Postfix) with ESMTP id EF63823B8F for ; Wed, 30 Jun 2004 11:36:31 +0200 (CEST) Received: from tucholsky.office.tanto.de (morus@tucholsky [127.0.0.1]) by tucholsky.office.tanto.de (8.12.3/8.12.3/Debian-6.6) with ESMTP id i5U9aSGL004516 for ; Wed, 30 Jun 2004 11:36:28 +0200 Received: (from morus@localhost) by tucholsky.office.tanto.de (8.12.3/8.12.3/Debian-6.6) id i5U9aS8k004512; Wed, 30 Jun 2004 11:36:28 +0200 From: Morus Walter MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Message-ID: <16610.35228.247157.936985@tanto-xipolis.de> Date: Wed, 30 Jun 2004 11:36:28 +0200 To: "Lucene Users List" Subject: Re: return value of terms() In-Reply-To: <20040629135404.5225.qmail@web12708.mail.yahoo.com> References: <143088710.1088516807953.JavaMail.dummy@smb-tec.com> <20040629135404.5225.qmail@web12708.mail.yahoo.com> X-Mailer: VM 7.03 under 21.4 (patch 6) "Common Lisp" XEmacs Lucid X-Virus-Scanned: by amavisd-new at mail.tanto.de Q&A postmaster@tanto.de X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Otis Gospodnetic writes: > I see. A search for that Term still gets Hits. I don't think this > should be happening. Maybe Erik or one of the other Lucene developer= s > will have some ideas. >=20 Maybe I missunderstood something, but a search shouldn't get hits, sinc= e a=20 search removes hits from deleted documents (given that the term doen't occur in other undeleted documents). OTOH looking at the TermEnum one will still find terms of deleted docum= ents. AFAIK deleting a document doesn't mean removing a document from the ind= ex. Deleting a document means marking the document deleted. The document will be removed at the next index optimization (or=20 implicit merge I guess; but I'm not sure about that) I don't know when the frequency numbers are updated though. So if I was Lars I'd call optimize after deleting and be on the safe si= de... (which might be a problem for frequent deletions on large indexes thoug= h) Morus >=20 >=20 > --- Lars Martin wrote: > > -----Urspr=FCngliche Nachricht----- > > Von: Otis Gospodnetic > > Gesendet am: 29. Jun 2004, 13:46:41 > >=20 > > > I would try using the delete(Term) method, to ensure all document= s > > > with the given Term are removed: > > >=20 > > > IndexReader indexReader =3D IndexReader.open( indexPath );=20 > > > indexReader.delete( new Term( "body", "YourTermHere" ) ); > > > indexReader.close(); > > > ... > > > IndexReader indexReader =3D IndexReader.open( indexPath ); > > > TermEnum enum =3D indexReader.terms( new Term( "body", "" ) );= > > >=20 > > > Something like that... > >=20 > >=20 > > Thanks for your reply. > >=20 > > I do not want to delete documents by terms. All my indexed document= s=20 > > are referenced by id, so I have to use delete( id ). What makes me > > insecure is the fact, that there are still terms in index from > > documents > > which are already deleted. This would mean that TermEnum is a > > continously > > growing beast. No problem when I query such a term, because no > > document > > is matching the query. But when I do computation based on indexed > > terms > > I heavily depend on the number of terms in the current TermEnum. Ev= en > > if such terms don't impact my computation - because the freq is > > always > > 0 - it would have an impact on my runtime behavior and complexity. > >=20 > > Regards, Lars > >=20 > >=20 > > > Otis > > >=20 > > > --- Lars Martin wrote: > > > > Hi. > > > >=20 > > > > Is it the normal behavior that IndexReader.terms( Term t ) stil= l > > > > returns Terms which are not any longer to be found in the index= , > > > > e.g. after removing the document containing these Terms? > > > > I've removed nearly all documents from index but the terms() > > method > > > > is still returning all terms. > > > >=20 > > > > IndexReader indexReader =3D IndexReader.open( indexPath );=20= > > > > indexReader.delete( docId ); > > > > indexReader.close(); > > > > ... > > > > IndexReader indexReader =3D IndexReader.open( indexPath ); > > > > TermEnum enum =3D indexReader.terms( new Term( "body", "" ) )= ; > > > >=20 > > > > Any hints? Regards, Lars >=20 >=20 > ---------------------------------------------------------------------= > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org