lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shamik Bandopadhyay <sham...@gmail.com>
Subject Include stopwords in phrase search
Date Thu, 05 Feb 2015 02:47:49 GMT
Hi,

  I'm having an issue running phrase quires with stopwords. Looks like Solr
is ignoring the stopword during search. Here's my search term.

"cannot open device"

When I'm executing title:"cannot open device" , it's bringing back titles
with "Find Open Devices".  Here's my field definition for title :

<field name="title" type="adsktext" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="adsktext" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

Sample text :

<doc>
<field name="id">111!SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="title">Find Open Devices</field>
</doc>
<doc>
<field name="id">333!SOLR1002</field>
<field name="name">ElasticSearch Server</field>
<field name="title">Cannot open device</field>
</doc>

I've "cannot" as part of my stopword list.

Weird part is, when I analyze the phrase in Solr admin, it's getting
indexed as the following three tokens :

cannot open devic

I'm in Solr 4.7, so not sure if enablePositionIncrements="true" is making
any difference.

Any feedback will be appreciated.

Thanks,
Shamik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message