lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Different queries for same meaning searches
Date Tue, 21 Aug 2012 14:21:08 GMT
Solr doesn't actually "know" any natural language, so it has no way of 
assessing whether two token streams "have the same meaning." In your case, 
the surface forms/syntax are subtly different - two separate terms vs. a 
single source term with embedded punctuation.

It appears that you are probbaly using the edismax query parser and probably 
have "mm" set to "100%" or "q.op" set to "AND" (the "~2" indicates a 
BooleanQuery with minMatch of 2 terms.) "mm" of 100%" is equivalent to the 
"AND" operator, some/most of the time.

For the second query you have a "split-term" which is treated as a single 
term/token until the fieldType analyzer splits it into two terms and then 
does an "OR" of the sub-terms. Unfortunately, "mm" and "q.op" are not passed 
down to the analyzer, so you have no way of changing that "OR" to an "AND" - 
this is why you get different results. But what you can do is set 
"autoGeneratePhraseQueries="true"" on your field type(s) to cause the query 
parser to generate a phrase query for "q  osona" rather than the "OR". 
That's not the same as "AND", but depending on your application it may be 
sufficient or even preferable.

-- Jack Krupansky

-----Original Message----- 
From: Dalius Sidlauskas
Sent: Tuesday, August 21, 2012 9:35 AM
To: solr-user@lucene.apache.org
Subject: Different queries for same meaning searches

Hello, here is my index and index analyzer configuration:

<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="’|'"
replacement=" "/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>

Search for "d Osona" and "d’Osona" creates "d" and "osona" tokens. But
ParsedQuery is different:

#1 "d Osona"

+((
DisjunctionMaxQuery((search_definitions:d | search_title:d))
DisjunctionMaxQuery((search_definitions:osona | search_title:osona))
)~2)
DisjunctionMaxQuery((search_definitions:"d osona" | search_title:"d
osona"^3.0))

#2 "d’Osona"

+DisjunctionMaxQuery((
(search_definitions:d search_definitions:osona) |
(search_title:d search_title:osona)
))
DisjunctionMaxQuery((search_definitions:"d osona" | search_title:"d
osona"^3.0))


And the results are different as well. Where I can find explanation for
this?

-- 
Regards!
Dalius Sidlauskas 


Mime
View raw message