lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajan, Renuka" <renuka.ra...@navteq.com>
Subject Matching accented with non-accented characters
Date Tue, 25 Jul 2006 15:34:06 GMT
Hi All

 

I am trying to match accented characters with non-accented characters in French/Spanish and
other Western European languages.  The use case is that the users may type letters without
accents in error and we still want to be able to retrieve valid matches.  The one idea, albeit
naïve, is to normalize the data on the inbound side as well as the data in the database (prior
to full text indexing) and retrieve matches.  

 

For instance if the database contains a word like BE/BE/ (/ being the equivalent of aigu since
I don't have a French keyboard:-)) and the input is erroneously provided as BE/BE (last aigu
missing), we still want to be able retrieve BE/BE/ as a candidate match admittedly with an
error margin.  

 

Has anyone using Lucene successfully (ie in terms of decent performance AND validity of results)
to match non-accented characters with accented ones using some method?  Any method?  Anyone
have suggestions to improve the suggestion above?

 

Any input will be greatly appreciated! Merci beaucoup :-)

Renuka



The information contained in this communication may be CONFIDENTIAL and is intended only for
the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby
notified that any dissemination, distribution, or copying of this communication, or any of
its contents, is strictly prohibited.  If you have received this communication in error, please
notify the sender and delete/destroy the original message and any copy of it from your computer
or paper files.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message