lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Rode <>
Subject European Languages search problem
Date Thu, 28 Jul 2005 16:40:14 GMT
Hello everybody,

First of congrats for that great piece of software!

I am working on a Europe-wide project, where we have texts on more than 
one European language, namely French, German, and English. Having tried 
the German and the FrenchAnalyzer both are not satisfying for what I need.

The GermanAnalyer should do a classic German umlaut conversion:

ä -> ae
ö -> oe
u -> ue

It does ä->a, ö->o, ü->u. This is not useful. If a word appears like 
"Oeffner" and i search for "Öffner", i dont find it! It does the 
conversion right for "ß", which converts to "ss".

For French I tried the FrenchAnalyzer, but it does not work (at least 
not the one in lukeall.jar, which is pretty up-to-date, I guess).

Well, in short, it would be nice to have a simple Analyzer which does 
the great job of the StandardAnalyzer, PLUS a few extras for European 
languages, and that is pretty easy:

For German: See above,
For French: Remove all the ` ´ ^ and the hook below the c
For Swedish, Polish, Czech ... remove everything which crosses, slashes 
or whatever the ascci letters a-z

Before I write my own Analyzer for that I was wondering if anybody has 
had the same problems and found already a solution for that!

Thanks for your help!

Take Care,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message