lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Patterson <jdp2...@gmail.com>
Subject Re: Deleted document terms
Date Tue, 26 Aug 2008 09:45:01 GMT

That was the problem - the id was not tokenized.  Thanks for your help.


Kalani Ruwanpathirana wrote:
> 
> Hi John,
> 
> Are you sure you made the id "tokenized" while indexing? I could overcome
> this issue by having a tokenized field, which was used for the deletion as
> below.
> 
> document.add(new Field("id", id, Field.Store.YES,
> *Field.Index.TOKENIZED*));
> 
> 
> 
> Thanks
> 
> 
> 
> On Tue, Aug 26, 2008 at 2:15 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
>>
>>
>> John Patterson wrote:
>>
>>  I just discovered some strange behaviour with deleted documents.  I do a
>>> search for documents with a certain query and delete one using
>>> IndexWriter.deleteDocuments(Term) using a key for the term.  Then I
>>> repeat
>>> the search and the document is still there because I use a custom
>>> HitCollector which does not check IndexReader.isDeleted(int).  That is
>>> all
>>> expected.
>>>
>>
>> Hmm -- once a document is deleted, your HitCollector won't ever see it.
>>  During searching, isDeleted is called per document at a very low level.
>>
>> If your HitCollector is seeing it, it sounds like it wasn't really
>> deleted.
>>  Are you sure you closed the IndexWriter and then reopened your searcher,
>> so
>> that the searcher will see the deletion?
>>
>>  But when I try to show the deleted document by searching by key using
>> the
>>> same term it was deleted with, it is not found.  So it seems that the
>>> term
>>> (id:MYKEY) is removed from the index.
>>>
>>
>> This is odd -- the document should either be deleted (entirely), or not.
>>  You shouldn't get different behavior if you search for the doc one way
>> vs
>> another.
>>
>>  So I was surprised that the term for the id was removed but not the
>> other
>>> terms for document.
>>>
>>
>> That make two of us!
>>
>> Mike
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> -- 
> Kalani Ruwanpathirana
> Department of Computer Science & Engineering
> University of Moratuwa
> 
> 

-- 
View this message in context: http://www.nabble.com/Deleted-document-terms-tp19157027p19158657.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message