lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick ALLAERT <>
Subject Re: Handling wildcard search containing special characters (unicode)
Date Thu, 31 Mar 2011 16:20:44 GMT
2011/3/31 Robert Muir <>:
> On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
> <> wrote:
>> Hello,
>> Facing a Solr issue, I have been told that queries with a term like:
>> Kiinteistösih*
>> will not match the Finnish word "Kiinteistösihteeri" and that it's a
>> known limitation of Lucene.
>> Instead, using the word directly, without wildcard, works.
>> Do you confirm this a known limitation/bug?
>> If so do you have any registered issue about that?
> this isn't the case, there's no unicode limitation here.
> more likely, your analyzer is configured to lowercase text, so in the
> index Kiinteistösihteeri is really kiinteistösihteeri
> in other words, try kiinteistösih* and see how that works.

Following your suggestion, I tested with:

but it doesn't show me the intended result.

I have found the reason why, this is because of the
ISOLatin1AccentFilterFactory filter which is present for both the
"index" and "query" analyzer.
Searching with:
did the trick.

One question remains now: why should I lowercase terms containing a
wildcard and making the ISO Latin1 accent conversion myself while I do
<analyzer type="query">
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.ISOLatin1AccentFilterFactory"/>
for the corresponding fieldType?
I would have guessed it would does it for me.

Your reply helped me a lot understanding what's going on.
Thank you very much for your participation!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message