lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pifpaf...@gmx.de
Subject QueryParser or Query seem to tokenize keyword fields
Date Fri, 12 Sep 2003 10:28:28 GMT
My own TokenStream drops all numbers. The document IDs I store with a
document 
are numbers. I enter the ID field as a keyword field. Consquently it is not
run through
tokenization and ends up nicely in the index.

However, when I want to query according to the ID field, the query is run
through
the tokenizer. The tokenizer drops the number producing no output and
the query produces no result.

DISCUSSION:
The immediate solution is to instruct my Analyzer to generate a different
TokenStream
for the ID field, namely one which does not drop numbers.

However, the Lucene documents, tutorials and FAQ state often enought that
document tokenization and query tokenization must be identical to get
consistent results. Therefore I would expect that the information whether
a field needs tokenization or not is taken from the index when querying.
This,
however, is not possible because it seems a field name does not need
consistently
refer to a tokenized or non-tokenized field.

I wonder if it would make sense to requrie that fields names are registered
with the index before they can be used in Documents and to then
enforce consistent use.

  Harald.


-- 
COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test
--------------------------------------------------
1. GMX TopMail - Platz 1 und Testsieger!
2. GMX ProMail - Platz 2 und Preis-Qualit├Ątssieger!
3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post


Mime
View raw message