lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Lackhoff <mich...@lackhoff.de>
Subject EnglishPorterFilterFactory and PatternReplaceFilterFactory
Date Thu, 02 Jul 2009 13:27:57 GMT
In Germany we have a strange habbit of seeing some sort of equivalence
between Umlaut letters and a two letter representation. Example 'ä' and
'ae' are expected to give the same search results. To achieve this I
added this filter to the "text" fieldtype definition:
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="ä" replacement="ae" replace="all"
        />
to both index and query analyzers (and more for the other umlauts).

This works well when I search for a name (a word not stemmed) but not
e.g. with the word "Wärme".
search for 'wärme' works
search for 'waerme' does not work
search for 'waerm' works if I move the EnglishPorterFilterFactory after
the PatternReplaceFilterFactory.

DebugQuery for "waerme" gives a parsedquery FS:waerm.
What I don't understand is why the (existing) records are not found. If
I understand it right, there should be 'waerm' in the index as well.

By the way, the reason why I keep the EnglishPorterFilterFactory is that
the records are in many languages and the English stemming gives good
results in many cases and I don't want (yet) to multiply my fields to
have language specific versions.
But even if the stemming is not right because the language is not
English I think records should be found as long as the analyzers are the
same for index and query.

This is with Solr 1.3.

Can someone shed some light on what is going on and how I can achieve my
goal?

-Michael

Mime
View raw message