lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitaliy Zhovtyuk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6009) edismax mis-parsing RegexpQuery
Date Sun, 21 Sep 2014 20:32:34 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vitaliy Zhovtyuk updated SOLR-6009:
-----------------------------------
    Attachment: SOLR-6009.patch

Actually there are 2linked issues:
1. edismax was not supported Regex queries
2. since regex queries was not supported RegexpQuery was created by org.apache.solr.parser.SolrQueryParserBase#getRegexpQuery
without taking into account aliasing and org.apache.solr.search.ExtendedDismaxQParser#IMPOSSIBLE_FIELD_NAME

Attached patch provide support for RegexQueries and fix issue with leaking impossible field
name. Also added tests covering case with defined field and undefined field (but matching
by '*' dynamic field) and DebugQuery output.

> edismax mis-parsing RegexpQuery
> -------------------------------
>
>                 Key: SOLR-6009
>                 URL: https://issues.apache.org/jira/browse/SOLR-6009
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.7.2
>            Reporter: Evan Sayer
>         Attachments: SOLR-6009.patch
>
>
> edismax appears to be leaking its IMPOSSIBLE_FIELD_NAME into queries involving a RegexpQuery.
 Steps to reproduce on 4.7.2:
> 1) remove the explicit <field /> definition for 'text'
> 2) add a catch-all '*' dynamic field of type text_general
> {code}
> <dynamicField name="*" type="text_general" multiValued="true" indexed="true" stored="true"
/>
> {code}
> 3) index the exampledocs/ data
> 4) run a query like the following:
> {code}
> http://localhost:8983/solr/collection1/select?q={!edismax%20qf=%27text%27}%20/.*elec.*/&debugQuery=true
> {code}
> The debugQuery output will look like this:
> {code}
> <lst name="debug">
> <str name="rawquerystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="querystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="parsedquery">(+RegexpQuery(:/.*elec.*/))/no_coord</str>
> <str name="parsedquery_toString">+:/.*elec.*/</str>
> {code}
> If you copy/paste the parsed-query into a text editor or something, you can see that
the field-name isn't actually blank.  The IMPOSSIBLE_FIELD_NAME ends up in there.
> I haven't been able to reproduce this behavior on 4.7.2 without getting rid of the explicit
field definition for 'text' and using a dynamicField, which is how things are setup on the
machine where this issue was discovered.  The query isn't quite right with the explicit field
definition in place either, though:
> {code}
> <lst name="debug">
> <str name="rawquerystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="querystring">{!edismax qf='text'} /.*elec.*/</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:elec)))/no_coord</str>
> <str name="parsedquery_toString">+(text:elec)</str>
> {code}
> numFound=0 for both of these.  This site is useful for looking at the characters in the
first variant:
> http://rishida.net/tools/conversion/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message