lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: search with accent not match
Date Wed, 15 Oct 2008 00:50:40 GMT


thta post appears to be specificly about a PHP function to convert UTF-8 
characters to their HTML equivilents ... which doesn'trelaly seem relevant 
to the posters question ...

: > I'm use FrenchAnalyzer for index 
: > in my document i have the word "lumiere" and "lumière"
: > 
: > when i search lumière only document match lumière but "lumiere" is not
: > return
: > 
: > and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
: > but not lumière

1) you should take a look at the Luke tool to help make sense of exactly 
what is getting indexed and how your query is getting parsed -- or just 
write a simple java program to look at the tokens produced by your 

2) the FrenchAnalyzer doesn't by default do any accent normalization (so 
i'm not sure why your search for lumiere is even matching lumiére ... but 
you may want to make your own Analyzer wrapping the FrenchAnalyzer that 
also uses the ISOLatin1AccentFilter to deal with this.


View raw message