Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DBA8F9A4B for ; Tue, 6 Mar 2012 15:10:22 +0000 (UTC) Received: (qmail 66953 invoked by uid 500); 6 Mar 2012 15:10:21 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 66746 invoked by uid 500); 6 Mar 2012 15:10:21 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 66737 invoked by uid 99); 6 Mar 2012 15:10:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 15:10:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 15:10:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CC355B916 for ; Tue, 6 Mar 2012 15:09:57 +0000 (UTC) Date: Tue, 6 Mar 2012 15:09:57 +0000 (UTC) From: "Andrzej Bialecki (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <694375883.27591.1331046597837.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <303482760.27537.1331045577014.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3854) Non-tokenized fields become tokenized when a document is deleted and added back MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223309#comment-13223309 ] Andrzej Bialecki commented on LUCENE-3854: ------------------------------------------- I suspect the problem lies in DocuemntStoredFieldVisitor.stringField(...). It uses FieldInfo to populate FieldType of the retrieved field, and there is no information there about the tokenization (so it assumes true by default). AFAIK the information about the tokenization is lost once the document is indexed so it's not possible to retrieve it back, hence the use of a default value. > Non-tokenized fields become tokenized when a document is deleted and added back > ------------------------------------------------------------------------------- > > Key: LUCENE-3854 > URL: https://issues.apache.org/jira/browse/LUCENE-3854 > Project: Lucene - Java > Issue Type: Bug > Components: core/index > Affects Versions: 4.0 > Reporter: Benson Margulies > > https://github.com/bimargulies/lucene-4-update-case is a JUnit test case that seems to show a problem with the current trunk. It creates a document with a Field typed as StringField.TYPE_STORED and a value with a "-" in it. A TermQuery can find the value, initially, since the field is not tokenized. > Then, the case reads the Document back out through a reader. In the copy of the Document that gets read out, the Field now has the tokenized bit turned on. > Next, the case deletes and adds the Document. The 'tokenized' bit is respected, so now the field gets tokenized, and the result is that the query on the term with the - in it no longer works. > So I think that the defect here is in the code that reconstructs the Document when read from the index, and which turns on the tokenized bit. > I have an ICLA on file so you can take this code from github, but if you prefer I can also attach it here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org