lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitaliy Zhovtyuk (JIRA)" <>
Subject [jira] [Updated] (SOLR-6009) edismax mis-parsing RegexpQuery
Date Sun, 21 Sep 2014 20:32:34 GMT


Vitaliy Zhovtyuk updated SOLR-6009:
    Attachment: SOLR-6009.patch

Actually there are 2linked issues:
1. edismax was not supported Regex queries
2. since regex queries was not supported RegexpQuery was created by org.apache.solr.parser.SolrQueryParserBase#getRegexpQuery
without taking into account aliasing and

Attached patch provide support for RegexQueries and fix issue with leaking impossible field
name. Also added tests covering case with defined field and undefined field (but matching
by '*' dynamic field) and DebugQuery output.

> edismax mis-parsing RegexpQuery
> -------------------------------
>                 Key: SOLR-6009
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.7.2
>            Reporter: Evan Sayer
>         Attachments: SOLR-6009.patch
> edismax appears to be leaking its IMPOSSIBLE_FIELD_NAME into queries involving a RegexpQuery.
 Steps to reproduce on 4.7.2:
> 1) remove the explicit <field /> definition for 'text'
> 2) add a catch-all '*' dynamic field of type text_general
> {code}
> <dynamicField name="*" type="text_general" multiValued="true" indexed="true" stored="true"
> {code}
> 3) index the exampledocs/ data
> 4) run a query like the following:
> {code}
> http://localhost:8983/solr/collection1/select?q={!edismax%20qf=%27text%27}%20/.*elec.*/&debugQuery=true
> {code}
> The debugQuery output will look like this:
> {code}
> <lst name="debug">
> <str name="rawquerystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="querystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="parsedquery">(+RegexpQuery(:/.*elec.*/))/no_coord</str>
> <str name="parsedquery_toString">+:/.*elec.*/</str>
> {code}
> If you copy/paste the parsed-query into a text editor or something, you can see that
the field-name isn't actually blank.  The IMPOSSIBLE_FIELD_NAME ends up in there.
> I haven't been able to reproduce this behavior on 4.7.2 without getting rid of the explicit
field definition for 'text' and using a dynamicField, which is how things are setup on the
machine where this issue was discovered.  The query isn't quite right with the explicit field
definition in place either, though:
> {code}
> <lst name="debug">
> <str name="rawquerystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="querystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:elec)))/no_coord</str>
> <str name="parsedquery_toString">+(text:elec)</str>
> {code}
> numFound=0 for both of these.  This site is useful for looking at the characters in the
first variant:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message