lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3854) Non-tokenized fields become tokenized when a document is deleted and added back
Date Tue, 06 Mar 2012 15:39:58 GMT


Uwe Schindler commented on LUCENE-3854:

In my opinion Document for *indexing* should be different from document *retrieved from stored
fields" (I am argueing all the time about that).

One simple solution:
When a field is loaded using StoredFieldsVisitor from index, lets set an internal flag in
the document/field instances (e.g. by a pkg-private ctor of Document), so when you try to
readd such a loaded document to IndexWriter you get an exception. Very simple and is a good
solution for now.

But I agree with Robert, Document/Field API is messy and trappy in that regard.
> Non-tokenized fields become tokenized when a document is deleted and added back
> -------------------------------------------------------------------------------
>                 Key: LUCENE-3854
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Benson Margulies
> is a JUnit test case that seems to
show a problem with the current trunk. It creates a document with a Field typed as StringField.TYPE_STORED
and a value with a "-" in it. A TermQuery can find the value, initially, since the field is
not tokenized.
> Then, the case reads the Document back out through a reader. In the copy of the Document
that gets read out, the Field now has the tokenized bit turned on. 
> Next, the case deletes and adds the Document. The 'tokenized' bit is respected, so now
the field gets tokenized, and the result is that the query on the term with the - in it no
longer works.
> So I think that the defect here is in the code that reconstructs the Document when read
from the index, and which turns on the tokenized bit.
> I have an ICLA on file so you can take this code from github, but if you prefer I can
also attach it here.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message