lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Discrepancies between search results and reader.document(i).get("path")
Date Sat, 30 Mar 2013 01:09:30 GMT
On Sat, Mar 30, 2013 at 12:39 AM, Bushman, Lamont <bus08002@byui.edu> wrote:
> However, with your response, especially if I come across problems later.   reader.liveDocs()
is not found in IndexWriter.  I am guessing you are referring to the TermsEnum class.  I assume
numDocs() returns the amount of documents that are left to search and maxDoc() is the greatest
id that still exists.  I believe my program is working fine for me now because I am using
the forceMergeDeletes() method, so that numDocs() will always be the same as maxDoc().  Am
I right on my assumptions?

When you delete documents, Lucene doesn't delete them in-place. They
are first just marked as deleted but still present in the index, this
is why AtomicReader[1] has 3 methods:
 - numDocs() which returns the number of non-deleted documents
 - maxDoc() which return the greatest ID that exists plus one (so if
no document is deleted, numDocs() == maxDoc())
 - getLiveDocs() which returns a bitmap of the documents that exist in
your index. A document docID is not deleted if getLiveDocs() is null
or if getLiveDocs().get(docID) return true.

This third method is not present on the IndexReader class that you are
manipulating. This is because this reader is not atomic, but you could
still get its live docs by calling MultiFields.getLiveDocs(reader). It
is however rather uncommon to use this method in high-level code
because when you run queries against an IndexSearcher, which is the
most common way to retrieve doc IDs from Lucene, Lucene already took
care of evicting the deleted documents, even if they matched.

I hope this helps.

[1] http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/AtomicReader.html

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message