lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: return value of terms()
Date Wed, 30 Jun 2004 09:36:28 GMT
Otis Gospodnetic writes:
> I see.  A search for that Term still gets Hits.  I don't think this
> should be happening.  Maybe Erik or one of the other Lucene developers
> will have some ideas.
> 
Maybe I missunderstood something, but a search shouldn't get hits, since a 
search removes hits from deleted documents (given that the term doen't
occur in other undeleted documents).

OTOH looking at the TermEnum one will still find terms of deleted documents.

AFAIK deleting a document doesn't mean removing a document from the index.
Deleting a document means marking the document deleted.
The document will be removed at the next index optimization (or 
implicit merge I guess; but I'm not sure about that)

I don't know when the frequency numbers are updated though.

So if I was Lars I'd call optimize after deleting and be on the safe side...
(which might be a problem for frequent deletions on large indexes though)

Morus

> 
> 
> --- Lars Martin <Lars.Martin@smb-tec.com> wrote:
> > -----Urspr√ľngliche Nachricht-----
> > Von: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > Gesendet am: 29. Jun 2004, 13:46:41
> > 
> > > I would try using the delete(Term) method, to ensure all documents
> > > with the given Term are removed:
> > > 
> > >    IndexReader indexReader = IndexReader.open( indexPath ); 
> > >    indexReader.delete( new Term( "body", "YourTermHere" ) );
> > >    indexReader.close();
> > >    ...
> > >    IndexReader indexReader = IndexReader.open( indexPath );
> > >    TermEnum enum = indexReader.terms( new Term( "body", "" ) );
> > > 
> > > Something like that...
> > 
> > 
> > Thanks for your reply.
> > 
> > I do not want to delete documents by terms. All my indexed documents 
> > are referenced by id, so I have to use delete( id ). What makes me
> > insecure is the fact, that there are still terms in index from
> > documents
> > which are already deleted. This would mean that TermEnum is a
> > continously
> > growing beast. No problem when I query such a term, because no
> > document
> > is matching the query. But when I do computation based on indexed
> > terms
> > I heavily depend on the number of terms in the current TermEnum. Even
> > if such terms don't impact my computation - because the freq is
> > always
> > 0 - it would have an impact on my runtime behavior and complexity.
> > 
> > Regards, Lars
> > 
> > 
> > > Otis
> > > 
> > > --- Lars Martin <Lars.Martin@smb-tec.com> wrote:
> > > > Hi.
> > > > 
> > > > Is it the normal behavior that IndexReader.terms( Term t ) still
> > > > returns Terms which are not any longer to be found in the index,
> > > > e.g. after removing the document containing these Terms?
> > > > I've removed nearly all documents from index but the terms()
> > method
> > > > is still returning all terms.
> > > > 
> > > >   IndexReader indexReader = IndexReader.open( indexPath ); 
> > > >   indexReader.delete( docId );
> > > >   indexReader.close();
> > > >   ...
> > > >   IndexReader indexReader = IndexReader.open( indexPath );
> > > >   TermEnum enum = indexReader.terms( new Term( "body", "" ) );
> > > > 
> > > > Any hints? Regards, Lars
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message