lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dean Gurvitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9185) Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace before sending terms to analysis
Date Tue, 27 Mar 2018 18:32:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416051#comment-16416051
] 

Dean Gurvitz commented on SOLR-9185:
------------------------------------

I missed that comment. Anyways, I just think that we should be more careful with such changes
in minor versions, and at least explicitly mention them in the changes.txt file for those
who with to upgrade their version.

> Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace
before sending terms to analysis
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9185
>                 URL: https://issues.apache.org/jira/browse/SOLR-9185
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Major
>             Fix For: 6.5, 7.0
>
>         Attachments: SOLR-9185.patch, SOLR-9185.patch, SOLR-9185.patch, SOLR-9185.patch
>
>
> Copied from LUCENE-2605:
> The queryparser parses input on whitespace, and sends each whitespace separated term
to its own independent token stream.
> This breaks the following at query-time, because they can't see across whitespace boundaries:
> n-gram analysis
> shingles
> synonyms (especially multi-word for whitespace-separated languages)
> languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters
will do the same thing at index and querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse around only
real 'operators'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message