lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <nageshbl...@gmail.com>
Subject Re: Deleted documents in the index.
Date Fri, 25 Jul 2008 14:56:37 GMT
Hey Michael,
The maxDoc() did the trick ! Thanks !

I have got some reading to do about numDocs() and maxDoc().....


Nagesh

On 7/25/08, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <nageshblore@gmail.com>
wrote:
> Hi Michael,
> The numDocs did come from IndexReader.numDocs().
>
> hmm...let me try with maxDoc.
>
> Nagesh
>
> On 7/25/08, Michael McCandless <lucene@mikemccandless.com> wrote:
>>
>> Oh, I think I see the problem -- instead of numDocs in your for loop
>> (which I assume came from IndexReader.numDocs()) change that to maxDoc
>> (IndexReader.maxDoc()).
>>
>> Mike
>>
>> (Nagesh S) wrote:
>>
>>> Hi Michael,
>>> Thanks for your response. Yes, I got that.
>>>
>>> I guess, my question is, how do I access the newly added document ? In
>>> other words, if the index initially had 20 docs of which 10 were
>>> updated (that is, deleted and then added), how do I access the updated
>>> ones ?
>>>
>>> Initially, there was no check for delete - that is, I did not have
>>> IndexReader.isDeleted(int). It had the for loop only which would fail
>>> when obtaining a 'deleted' document with the following :
>>>
>>> java.lang.IllegalArgumentException: attempt to access a deleted
>>> document
>>> at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:
>>> 331)
>>> at org.apache.lucene.index.MultiReader.document(MultiReader.java:108)
>>> at org.apache.lucene.index.IndexReader.document(IndexReader.java:437)
>>> etc.
>>>
>>> Regards,
>>> Nagesh
>>>
>>> On 7/25/08, Michael McCandless <lucene@mikemccandless.com> wrote:
>>>>
>>>> When you call updateDocument, the old  document is deleted but a
>>>> wholly new document is added.  So the "else" clause in your loop
>>>> below
>>>> will report on the newly added documents (you won't miss any).
>>>>
>>>> Mike
>>>>
>>>> (Nagesh S) wrote:
>>>>
>>>>> Hi,
>>>>> I think, the earlier mail didn't make it through.
>>>>>
>>>>> I am writing a class to report on an index. This index has documents
>>>>> updated using the IndexWriter.updateDocument(Term, Document) method.
>>>>> That is, documents were deleted and added again. My aim is to see
>>>>> what
>>>>> documents (and their fields) are present in the index. Since the
>>>>> document was updated (i.e. deleted and added), it is marked as
>>>>> deleted
>>>>> and hence not able to obtain a Document object for the updated
>>>>> document.
>>>>>
>>>>> How do I report on such documents ?
>>>>>
>>>>> for (int i = 1; i < numDocs; i++) {
>>>>> //ir is an IndexReader object
>>>>>           if (ir.isDeleted(i)) {
>>>>>               bw.write("Document " + i + " has been deleted.");
>>>>>               bw.newLine();
>>>>>           } else {
>>>>>               Document d = getDocument(ir, i);
>>>>>
>>>>>               List<Field> l = d.getFields();
>>>>>               int numFields = l.size();
>>>>>               bw.write("Document has " + numFields + " fields as
>>>>> follows");
>>>>>               bw.newLine();
>>>>>
>>>>>               for (int j = 0; j < numFields; j++) {
>>>>>                   String fieldName = l.get(j).name();
>>>>>                   bw.write("\t Field : " + fieldName + " Value : "
>>>>>                           + d.getField(fieldName).stringValue());
>>>>>                   bw.newLine();
>>>>>               }
>>>>>           }
>>>>>       }
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Mime
View raw message