lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Hubold <andreas.hub...@coremedia.com>
Subject dismax query does not match with additional field in qf
Date Tue, 30 Sep 2014 15:14:29 GMT
Hi,

I ran into a problem with the Solr dismax query parser. We're using Solr 
4.10.0 and the field types mentioned below are taken from the example 
schema.xml.

In a test we have a document with rather strange content in a field 
named "name_tokenized" of type "text_general":

abc_<iframe src='loadLocale.js' onload='javascript:document.XSSed="name"' width=0 height=0>

(It's a test for XSS bug detection, but that doesn't matter here.)

I can find the document when I use the following dismax query with qf 
set to field "name_tokenized" only:

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2

If I submit exactly the same query but add another field "feederstate" 
to the qf parameter, I don't get any results anymore. The field is of 
type "string".

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2%20feederstate

The decoded value of q is: abc_<iframe src='loadLocale.js' 
onload='javascript:document.XSSed="name"' and it seems the trailing 
single-quote causes problems here. (In fact, I can find the document 
when I remove the last char)
The parsed query for the latter case is

(
   +((
     DisjunctionMaxQuery((feederstate:abc_<iframe | ((name_tokenized:abc_ name_tokenized:iframe)^2.0))~0.1)
     DisjunctionMaxQuery((feederstate:src='loadLocale.js' | ((name_tokenized:src name_tokenized:loadlocale.js)^2.0))~0.1)
     DisjunctionMaxQuery((feederstate:onload='javascript:document.XSSed= | ((name_tokenized:onload
name_tokenized:javascript:document.xssed)^2.0))~0.1)
     DisjunctionMaxQuery((feederstate:name | name_tokenized:name^2.0)~0.1)
     DisjunctionMaxQuery((feederstate:')~0.1)
   )~5)

   DisjunctionMaxQuery((textbody:"abc_ iframe src loadlocale.js onload javascript:document.xssed
name" | name_tokenized:"abc_ iframe src loadlocale.js onload javascript:document.xssed name"^2.0)~0.1)
)/no_coord


I've configured the handler with <str name="mm">100%</str> so that all 
of the 5 dismax queries at the top must match. But this one does not match:

DisjunctionMaxQuery((feederstate:')~0.1)


I'd expect that an additional field in the qf parameter would not lead 
to fewer matches.
Okay, the above example is a rather crude test but I'd like to 
understand it. Is this a bug in Solr?

I've also found https://issues.apache.org/jira/browse/SOLR-3047 which 
sounds somewhat similar.

Regards,
Andreas

Mime
View raw message