lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shamik Bandopadhyay <sham...@gmail.com>
Subject Fwd: Numeric value ignored by EdgeNGramFilterFactory
Date Thu, 04 Jul 2019 08:38:49 GMT
Hi,

   I'm using EdgeNGramFilterFactory to support partial search. Here's my
field definition.

<fieldType name="adsktext" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="30"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

I run into an issue when I'm trying a numeric terms in search. For e.g. if
I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
hou and hour in index. Since I'm using AND operator, the query fails to
match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
I thought that would be an un-necessary overhead. Is there a reason why 72
is ignored and what'll be the best way to address this scenario?

Any pointers will be appreciated.

Thanks,
Shamik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message