lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Barile <...@prosoma.com.br>
Subject Re: query matching all documents
Date Thu, 22 May 2003 14:38:46 GMT
What are the performance concerns of doing an optimization after every
delete ?

On Thu, 2003-05-22 at 11:36, Brisbart Franck wrote:
> You're right.
> When you delete a document, the document is marked as 'deleted'. And the 
> documents numbers are still the same until an optimize is done.
> 
> So, after deleting documents, if you want to list them:
> - either you do a loop from 0 to maxDoc() and you treat the deleted docs 
> (with the same IndexReader)
> - or you do an 'optimize' and with a brand new IndexReader you do your 
> loop from 0 to numDocs() (without any deleted docs to treat).
> 
> franck
> 
> Guilherme Barile wrote:
> > What I didn't figure out is, if I have some index like:
> > [0] doc1.txt
> > [1] doc2.doc
> > [2] doc3.xls
> > [3] doc4.nfo
> > [4] doc4.pdf
> > 
> > and then I delete doc2.doc (document #1 in lucene). Will the other
> > documents numbers change ? or there will be a gap in my index ?
> > Let's list it again with doc2.doc deleted (supposing the gap will be
> > there)
> > 
> > [0] doc1.txt
> > [1] << DELETED >>
> > [2] doc3.xls
> > [3] doc4.nfo
> > [4] doc4.pdf
> > 
> > this way (i think) numDocs() will return 4, but maxDocs() would return
> > 5. Using numDocs() would make me lose a document, at least in the way I
> > implemented it. Any tips ?
> > 
> > gui
> > 
> > 
> > On Thu, 2003-05-22 at 10:32, Brisbart Franck wrote:
> > 
> >>You don't really need to take care of the deleted docs. When you'll try 
> >>to get a deleted doc (reader.document(i) on a deleted doc), a 
> >>IllegalArgumentException will thrown with the message 'attempt to access 
> >>a deleted document'. Just catch this exception.
> >>
> >>Also, I suggest you to use 'numDocs()' instead of 'maxDoc()' to get the 
> >>real number of documnets in the index.
> >>
> >>Franck
> >>
> >>Guilherme Barile wrote:
> >>
> >>>As I said, I'm still getting started (didn't implement deleting
> >>>documents yet). Any tips on checking this ?
> >>>
> >>>On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
> >>>
> >>>
> >>>>Guilherme Barile writes:
> >>>>
> >>>>
> >>>>>If you're trying to get all documents, why not
> >>>>>
> >>>>>IndexReader reader = IndexReader.open(this.indexDir);
> >>>>>Document doc;
> >>>>>	
> >>>>>for (int i = 0; i < reader.maxDoc(); i++) {
> >>>>>	try {
> >>>>>		doc = reader.document(i);
> >>>>>		System.out.println(i + " " + doc.get("source"));
> >>>>>	}
> >>>>>	catch (Exception e) {
> >>>>>		System.out.println("Error getting doc " + i);
> >>>>>	}
> >>>>>}
> >>>>>
> >>>>
> >>>>I guess there should be some extra check to take care of deleted
> >>>>documents, that aren't removed from the index yet.
> >>>>
> >>>>greetings
> >>>>	Morus
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message