Return-Path: Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 37960 invoked from network); 12 Sep 2003 10:28:14 -0000 Received: from unknown (HELO mx0.gmx.net) (213.165.64.100) by daedalus.apache.org with SMTP; 12 Sep 2003 10:28:14 -0000 Received: (qmail 29986 invoked by uid 0); 12 Sep 2003 10:28:28 -0000 Received: from 193.62.199.77 by www28.gmx.net with HTTP; Fri, 12 Sep 2003 12:28:28 +0200 (MEST) Date: Fri, 12 Sep 2003 12:28:28 +0200 (MEST) From: pifpafpuf@gmx.de To: lucene-user@jakarta.apache.org MIME-Version: 1.0 Subject: QueryParser or Query seem to tokenize keyword fields X-Priority: 3 (Normal) X-Authenticated: #1850795 Message-ID: <13813.1063362508@www28.gmx.net> X-Mailer: WWW-Mail 1.6 (Global Message Exchange) X-Flags: 0001 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N My own TokenStream drops all numbers. The document IDs I store with a document are numbers. I enter the ID field as a keyword field. Consquently it is not run through tokenization and ends up nicely in the index. However, when I want to query according to the ID field, the query is run through the tokenizer. The tokenizer drops the number producing no output and the query produces no result. DISCUSSION: The immediate solution is to instruct my Analyzer to generate a different TokenStream for the ID field, namely one which does not drop numbers. However, the Lucene documents, tutorials and FAQ state often enought that document tokenization and query tokenization must be identical to get consistent results. Therefore I would expect that the information whether a field needs tokenization or not is taken from the index when querying. This, however, is not possible because it seems a field name does not need consistently refer to a tokenized or non-tokenized field. I wonder if it would make sense to requrie that fields names are registered with the index before they can be used in Documents and to then enforce consistent use. Harald. -- COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test -------------------------------------------------- 1. GMX TopMail - Platz 1 und Testsieger! 2. GMX ProMail - Platz 2 und Preis-Qualit�tssieger! 3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post