lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick ALLAERT <patrick.alla...@gmail.com>
Subject Re: Handling wildcard search containing special characters (unicode)
Date Thu, 31 Mar 2011 16:20:44 GMT
2011/3/31 Robert Muir <rcmuir@gmail.com>:
> On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
> <patrick.allaert@gmail.com> wrote:
>> Hello,
>>
>> Facing a Solr issue, I have been told that queries with a term like:
>> Kiinteistösih*
>> will not match the Finnish word "Kiinteistösihteeri" and that it's a
>> known limitation of Lucene.
>> Instead, using the word directly, without wildcard, works.
>>
>> Do you confirm this a known limitation/bug?
>> If so do you have any registered issue about that?
>
> this isn't the case, there's no unicode limitation here.
>
> more likely, your analyzer is configured to lowercase text, so in the
> index Kiinteistösihteeri is really kiinteistösihteeri
> in other words, try kiinteistösih* and see how that works.

Following your suggestion, I tested with:
kiinteistösih*

but it doesn't show me the intended result.

I have found the reason why, this is because of the
ISOLatin1AccentFilterFactory filter which is present for both the
"index" and "query" analyzer.
Searching with:
kiinteistosih*
did the trick.

One question remains now: why should I lowercase terms containing a
wildcard and making the ISO Latin1 accent conversion myself while I do
have:
<analyzer type="query">
...
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.ISOLatin1AccentFilterFactory"/>
...
for the corresponding fieldType?
I would have guessed it would does it for me.

Your reply helped me a lot understanding what's going on.
Thank you very much for your participation!

Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message