lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lekamm <camille.blard...@gmail.com>
Subject Re: search with accent not match
Date Wed, 15 Oct 2008 05:26:23 GMT

http://www.blardone.org/2008/10/12/lucene-query-accented-character/

Is specific about Php, but can be easily use try to solve the same problem
in Java.

I had the same problem as "Christophe from paris", and changing the query to
it's html encoded equivalent makes my search queries work.

So Perhaps, Chris could try to html encode it's queries that contains accent
and see if more results are returned.

And sorry if it is php only solution.



hossman wrote:
> 
> 
> : http://www.blardone.org/2008/10/12/lucene-query-accented-character/
> 
> thta post appears to be specificly about a PHP function to convert UTF-8 
> characters to their HTML equivilents ... which doesn'trelaly seem relevant 
> to the posters question ...
> 
> : > I'm use FrenchAnalyzer for index 
> 	...
> : > in my document i have the word "lumiere" and "lumière"
> : > 
> : > when i search lumière only document match lumière but "lumiere" is not
> : > return
> : > 
> : > and if search "lumiere" the result is lumiere, lumieres
> ,lumiére,lumiéres
> : > but not lumière
> 
> 1) you should take a look at the Luke tool to help make sense of exactly 
> what is getting indexed and how your query is getting parsed -- or just 
> write a simple java program to look at the tokens produced by your 
> analyzer.
> 
> 2) the FrenchAnalyzer doesn't by default do any accent normalization (so 
> i'm not sure why your search for lumiere is even matching lumiére ... but 
> you may want to make your own Analyzer wrapping the FrenchAnalyzer that 
> also uses the ISOLatin1AccentFilter to deal with this.
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
View this message in context: http://www.nabble.com/search-with-accent-not-match-tp18848522p19986937.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message