lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Split on whitespace parameter doubt
Date Thu, 30 Aug 2018 18:13:20 GMT
Hi David,
Your observations seem correct. If all fields produces the same tokens then Solr goes for
“term centric” query, but if different fields produce different tokens, then it uses field
centric query. Here is blog post that explains it from multiword synonyms perspective: https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
<https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/>

IMO the issue is that it is not clear how term centric would look like in case of different
tokens: Imagine that your query is “a b” and you are searching  two fields title (analysed)
and title_s (string) so you will end up with tokens ‘a’, ‘b’ and ‘a b’. So term
centric query would be (title:a || title_s:a) (title:b || title_s:b)(title:a b || title_s:a
b). If not already weird, lets assume you allow one token to be missed…

I am not sure why field centric field is not used all the time or at least why there is no
parameter to force it.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 30 Aug 2018, at 15:02, David Argüello Sánchez <arguellosanchezdavid@gmail.com>
wrote:
> 
> Hi everyone,
> 
> I am doing some tests to understand how the split on whitespace
> parameter works with eDisMax query parser. I understand the behaviour,
> but I have a doubt about why it works like that.
> 
> When sow=true, it works as it did with previous Solr versions.
> When sow=false, the behaviour changes and all the terms have to be
> present in the same field. However, if all queried fields' query
> structure is the same, it works as if it had sow=true. This is the
> thing that I don’t fully understand.
> Specifying sow=false I might want to match only those documents
> containing all the terms in the same field, but because of all queried
> fields having the same query structure, I would get back documents
> containing both terms in any of the fields.
> 
> Does anyone know the reasoning behind this decision?
> Thank you in advance.
> 
> Regards,
> David


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message