lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Atamer" <>
Subject RE: literal search in quotes on non-tokenized field
Date Tue, 30 Nov 2004 23:01:37 GMT

> -----Original Message-----
> > Here's a log of the parsed query before going to the searcher:
> >
> > Parsed query: (Build:"origi") for the first search
> > Parsed query: (Build:origi) for the second search
> What do you mean by "parsed", since below you say you're not using
> QueryParser/Analyzer.

Sorry, that's residual log text. The lines of code are 

BooleanQuery totalQuery = new BooleanQuery();

.. logic to build totalQuery ...

log.debug("Parsed query: " + totalQuery.toString());
dbSearchHits =;

> > Right now we're not using a query parser / analyzer system to build the
> > query. We're building the query up.
> > The query mentioned above is a TermQuery object
> Let me hopefully clarify what you've said.... you've indexed (I'm not
> using quotes on purpose) origi, but you're doing a TermQuery on "origi"
> (with the quotes) and expecting it to match?
> It doesn't work that way.  A TermQuery must match *exactly* what was
> indexed (either directly as a Keyword, or as tokens emitted from the
> analyzer).  Since you're building the query up yourself from, I'm
> assuming, user input, you may need to pre-process what the user entered
> to get the right term to query on.  Only the term origi would match.

Yeah but it doesn't. The exact text in the database is ORIGI. Keyword
doesn't work if you supply more than one word. In fact we're doing it wrong.
Fields with a small number of terms should not be indexed as keyword, but
tokenized. I'm going to change the indexing strategy to only use keyword
when there's one and only one keyword in the data itself. Fields with two to
three words will be tokenized with the NoTokenizingTokenizer that was posted
earlier, and fields with four or more words will be tokenized with

All we need to do for searching keyword fields is remove the double quotes
to be consistent with searching in a tokenized field. Then use QueryParser
to parse the tokenized fields with the appropriate parser for the field.
This should solve the problem.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message