lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Deleted document terms
Date Tue, 26 Aug 2008 10:54:07 GMT

Normally an ID should be indexed as Field.Index.UN_TOKENIZED.

Mike

John Patterson wrote:

>
> That was the problem - the id was not tokenized.  Thanks for your  
> help.
>
>
> Kalani Ruwanpathirana wrote:
>>
>> Hi John,
>>
>> Are you sure you made the id "tokenized" while indexing? I could  
>> overcome
>> this issue by having a tokenized field, which was used for the  
>> deletion as
>> below.
>>
>> document.add(new Field("id", id, Field.Store.YES,
>> *Field.Index.TOKENIZED*));
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Tue, Aug 26, 2008 at 2:15 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>>
>>>
>>> John Patterson wrote:
>>>
>>> I just discovered some strange behaviour with deleted documents.   
>>> I do a
>>>> search for documents with a certain query and delete one using
>>>> IndexWriter.deleteDocuments(Term) using a key for the term.  Then I
>>>> repeat
>>>> the search and the document is still there because I use a custom
>>>> HitCollector which does not check IndexReader.isDeleted(int).   
>>>> That is
>>>> all
>>>> expected.
>>>>
>>>
>>> Hmm -- once a document is deleted, your HitCollector won't ever  
>>> see it.
>>> During searching, isDeleted is called per document at a very low  
>>> level.
>>>
>>> If your HitCollector is seeing it, it sounds like it wasn't really
>>> deleted.
>>> Are you sure you closed the IndexWriter and then reopened your  
>>> searcher,
>>> so
>>> that the searcher will see the deletion?
>>>
>>> But when I try to show the deleted document by searching by key  
>>> using
>>> the
>>>> same term it was deleted with, it is not found.  So it seems that  
>>>> the
>>>> term
>>>> (id:MYKEY) is removed from the index.
>>>>
>>>
>>> This is odd -- the document should either be deleted (entirely),  
>>> or not.
>>> You shouldn't get different behavior if you search for the doc one  
>>> way
>>> vs
>>> another.
>>>
>>> So I was surprised that the term for the id was removed but not the
>>> other
>>>> terms for document.
>>>>
>>>
>>> That make two of us!
>>>
>>> Mike
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> -- 
>> Kalani Ruwanpathirana
>> Department of Computer Science & Engineering
>> University of Moratuwa
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Deleted-document-terms-tp19157027p19158657.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message