lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: strange behavior of solr query parser
Date Mon, 02 Mar 2020 15:09:25 GMT
The *_str variant produced by the _default configset is DocValues only, as thus intended primarily
for faceting and sorting.
Try changing this line in your schema

<dynamicField name="*_str" type="strings" docValues="true" indexed="false" stored="false"
useDocValuesAsStored="false»/>

to

<dynamicField name="*_str" type="strings" docValues="true" indexed="true" stored="false"
useDocValuesAsStored="false»/>

…and it will both work and be more performant.

But also file a JIRA since it is obviously a bug - matching a string from DocValues should
still work even if slow.

Jan

> 2. mar. 2020 kl. 15:35 skrev Erick Erickson <erickerickson@gmail.com>:
> 
> Hongtai Xue:
> 
> First, many thanks for reporting this in such detail, it really helps and it’s obvious
you’ve dug into the problem rather than just thrown it over the wall.
> 
> Please do raise a JIRA, no matter what the behaviors should be the same.
> 
> One caution: Searching on a docValues=“true” indexed=“false” will not be performant
as the index grows last I knew (think “table scan”). DocValues is specifically designed
to answer the question “for doc y, what is the value if field x” and this form is asking
“for value x, what docs contain it”. At least check with a reasonably large data set before
allowing that in your app. Personally, I’d like to see the ability to search on a dv-only
field restricted, but that’s another story...
> 
> That is not to say the behavior you’re reporting is OK, it’s not. Just a caution
for you going forward.
> 
> Best,
> Erick
> 
>> On Mar 2, 2020, at 03:45, Hongtai Xue <hxue@yahoo-corp.jp> wrote:
>> 
>> 
>> Hi,
>>  
>> Our team found a strange behavior of solr query parser.
>> In some specific cases, some conditional clauses on unindexed field will be ignored.
>>  
>> for query like, q=A:1 OR B:1 OR A:2 OR B:2
>> if field B is not indexed(but docValues="true"), "B:1" will be lost.
>>  
>> but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>> it will work perfect.
>>  
>> the only difference of two queries is that they are wrote in different orders.
>> one is ABAB, another is AABB,
>>  
>> ■reproduce steps and example explanation
>> you can easily reproduce this problem on a solr collection with _default configset
and exampledocs/books.csv data.
>>  
>> 1. create a _default collection
>> bin/solr create -c books -s 2 -rf 2
>>  
>> 2. post books.csv.
>> bin/post -c books example/exampledocs/books.csv
>>  
>> 3. run following query.
>> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query
>>  
>>  
>> I printed query parsing debug information.
>> you can tell "name_str:Foundation" is lost.
>>  
>> query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
>> (please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 64
61 74 69 6f 6e")
>> --------
>>   "debug":{
>>     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)",
>>     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)",
>>     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 65
72 65 67]]))",
>>     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO [4a
68 65 72 65 67]])",
>>     "QParser":"LuceneQParser"}}
>> --------
>>  
>> but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
>> everything is OK. "name_str:Foundation" is not lost.
>> --------
>>   "debug":{
>>     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)",
>>     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)",
>>     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e]
TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])))",
>>     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69
6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65
67]]))",
>>     "QParser":"LuceneQParser"}}
>> --------
>> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query
>>  
>> we did a little bit research, and we wander if it is a bug of SolrQueryParser.
>> more specifically, we think if statement here might be wrong.
>> https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
>>  
>> Could you please tell us if it is a bug, or it's just a wrong query statement.
>>  
>> Thanks,
>> Hongtai Xue


Mime
View raw message