lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: European Languages search problem
Date Thu, 28 Jul 2005 16:46:27 GMT
Hi Martin,

When you write your own tokenizer/analyzer for this, you'll probably
want to emit multiple tokens for words that have umlauts and such - one
version with ä -> ae, the other with ä -> a perhaps.

As for stripping accents from characters, somebody posted
ISOLatinFilter.java (I think that was the class name) a few months
back.

If you can contribute your Analyzers, that would be great, as we
already have a small set of Analyzers in Lucene's contrib area in SVN.

Otis


--- Martin Rode <martin.rode@programmfabrik.de> wrote:

> Hello everybody,
> 
> First of congrats for that great piece of software!
> 
> 
> I am working on a Europe-wide project, where we have texts on more
> than 
> one European language, namely French, German, and English. Having
> tried 
> the German and the FrenchAnalyzer both are not satisfying for what I
> need.
> 
> The GermanAnalyer should do a classic German umlaut conversion:
> 
> ä -> ae
> ö -> oe
> u -> ue
> 
> It does ä->a, ö->o, ü->u. This is not useful. If a word appears like 
> "Oeffner" and i search for "Öffner", i dont find it! It does the 
> conversion right for "ß", which converts to "ss".
> 
> For French I tried the FrenchAnalyzer, but it does not work (at least
> 
> not the one in lukeall.jar, which is pretty up-to-date, I guess).
> 
> Well, in short, it would be nice to have a simple Analyzer which does
> 
> the great job of the StandardAnalyzer, PLUS a few extras for European
> 
> languages, and that is pretty easy:
> 
> For German: See above,
> For French: Remove all the ` ´ ^ and the hook below the c
> For Swedish, Polish, Czech ... remove everything which crosses,
> slashes 
> or whatever the ascci letters a-z
> 
> Before I write my own Analyzer for that I was wondering if anybody
> has 
> had the same problems and found already a solution for that!
> 
> 
> Thanks for your help!
> 
> Take Care,
> Martin
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message