lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Berryman (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2605) queryparser parses on whitespace
Date Mon, 11 Jun 2012 02:45:42 GMT


John Berryman commented on LUCENE-2605:

subscribed - Current client has index full of clothing - a search for "dress shoes" will return
results containing womens' dresses and running shoes.
> queryparser parses on whitespace
> --------------------------------
>                 Key: LUCENE-2605
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Robert Muir
>             Fix For: 4.1
> The queryparser parses input on whitespace, and sends each whitespace separated term
to its own independent token stream.
> This breaks the following at query-time, because they can't see across whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters
will do the same thing at index and querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse around only
real 'operators'.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message