lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jae Joo <jaejo...@gmail.com>
Subject WordDelimiterFilterFactory and PatternReplaceCharFilterFactory
Date Wed, 05 Nov 2014 22:32:18 GMT
Hi,

Once I apply PatternReplaceCharFilterFactory to the input string, the
position of token is changed.
Here is an example.
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(&lt;/?ce:italic[^>]*>)" replacement=""/>
<filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                splitOnCaseChange="0"
                splitOnNumerics="0"
                catenateWords="1"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="1"
                />

In the analysis page,
<ce:italic>p</ce:italic>-xylene and p-xylene (without xml tags) have
different positions.

for <ce:italic>p</ce:italic>-xylene,
p-xylene --> 1
xylene --> 2
p --> 2
pxylene -->

However, for the term (without tags) p-xylene,
p-xylene --> 1
p --> 1
xylene --> 2
pxylene --> 3

Only difference I can see is the start and end position because of xml tag.

Does any one know why?

Thanks,

Jae Joo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message