lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Deleted documents in the index.
Date Fri, 25 Jul 2008 13:47:11 GMT

Oh, I think I see the problem -- instead of numDocs in your for loop  
(which I assume came from IndexReader.numDocs()) change that to maxDoc  
(IndexReader.maxDoc()).

Mike

(Nagesh S) wrote:

> Hi Michael,
> Thanks for your response. Yes, I got that.
>
> I guess, my question is, how do I access the newly added document ? In
> other words, if the index initially had 20 docs of which 10 were
> updated (that is, deleted and then added), how do I access the updated
> ones ?
>
> Initially, there was no check for delete - that is, I did not have
> IndexReader.isDeleted(int). It had the for loop only which would fail
> when obtaining a 'deleted' document with the following :
>
> java.lang.IllegalArgumentException: attempt to access a deleted  
> document
> at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 
> 331)
> at org.apache.lucene.index.MultiReader.document(MultiReader.java:108)
> at org.apache.lucene.index.IndexReader.document(IndexReader.java:437)
> etc.
>
> Regards,
> Nagesh
>
> On 7/25/08, Michael McCandless <lucene@mikemccandless.com> wrote:
>>
>> When you call updateDocument, the old  document is deleted but a
>> wholly new document is added.  So the "else" clause in your loop  
>> below
>> will report on the newly added documents (you won't miss any).
>>
>> Mike
>>
>> (Nagesh S) wrote:
>>
>>> Hi,
>>> I think, the earlier mail didn't make it through.
>>>
>>> I am writing a class to report on an index. This index has documents
>>> updated using the IndexWriter.updateDocument(Term, Document) method.
>>> That is, documents were deleted and added again. My aim is to see  
>>> what
>>> documents (and their fields) are present in the index. Since the
>>> document was updated (i.e. deleted and added), it is marked as  
>>> deleted
>>> and hence not able to obtain a Document object for the updated
>>> document.
>>>
>>> How do I report on such documents ?
>>>
>>> for (int i = 1; i < numDocs; i++) {
>>> //ir is an IndexReader object
>>>           if (ir.isDeleted(i)) {
>>>               bw.write("Document " + i + " has been deleted.");
>>>               bw.newLine();
>>>           } else {
>>>               Document d = getDocument(ir, i);
>>>
>>>               List<Field> l = d.getFields();
>>>               int numFields = l.size();
>>>               bw.write("Document has " + numFields + " fields as
>>> follows");
>>>               bw.newLine();
>>>
>>>               for (int j = 0; j < numFields; j++) {
>>>                   String fieldName = l.get(j).name();
>>>                   bw.write("\t Field : " + fieldName + " Value : "
>>>                           + d.getField(fieldName).stringValue());
>>>                   bw.newLine();
>>>               }
>>>           }
>>>       }
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message