lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Created: (LUCENE-2605) queryparser parses on whitespace
Date Tue, 17 Aug 2010 03:31:17 GMT
queryparser parses on whitespace

                 Key: LUCENE-2605
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Robert Muir
             Fix For: 3.1, 4.0

The queryparser parses input on whitespace, and sends each whitespace separated term to its
own independent token stream.

This breaks the following at query-time, because they can't see across whitespace boundaries:
* n-gram analysis
* shingles 
* synonyms (especially multi-word for whitespace-separated languages)
* languages where a 'word' can contain whitespace (e.g. vietnamese)

Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will
do the same thing at index and querytime, but
in many cases they can't. Instead, preferably the queryparser would parse around only real

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message