lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]
Date Tue, 01 Mar 2005 12:16:04 GMT
I had to moderate both Jonathan and Jon's messages in to the list.  
Please subscribe to the list and post to it with the address you've 
subscribed.  I cannot always guarantee I'll catch moderation messages 
and send them through in a timely fashion.

	Erik

On Mar 1, 2005, at 6:18 AM, Jonathan O'Connor wrote:

> Jon,
> I too found some problems with the German analyser recently. Here's 
> what
> may help:
> 1. You can try reading Joerg Caumanns' paper "A Fast and Simple 
> Stemming
> Algorithm for German Words". This paper describes the algorithm
> implemented by GermanAnalyser.
> 2. I guess German nouns all capitalized, so maybe that's why. Although 
> you
> would want to be indexing well written German and not emails or text
> messages!
> 3. The German Stemmer converts umlauts into some funny form (the code 
> is a
> bit tricky, and I didn't spend any time looking at it), so maybe thats 
> why
> you can't find umlauts properly. I think the main reason for this 
> umlaut
> change is that many plurals are formed by umlauting: E.g. Haus, Haeuser
> (that ae is a umlaut).
>
> Finally, to really understand what's happening, get your hands on 
> Luke. I
> just got it last week, and its brilliant. It shows you everything about
> your indexes. You can also feed text to an Analyser, and see what it 
> makes
> of it. This will show you the real reason why your umlaut search is
> failing.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
>
>
>
> "Jon Humble" <jon.humble@tecsphere.com>
> 01/03/2005 09:35
> Please respond to
> "Lucene Users List" <lucene-user@jakarta.apache.org>
>
>
> To
> <lucene-user@jakarta.apache.org>
> cc
>
> Subject
> Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]
>
>
>
>
>
>
> Hello,
>
> We?re using the GermanAnalyzer/Stemmer to index/search our (German)
> Website.
> I have a few questions:
>
> (1)     Why is the GermanAnalyzer case-sensitive? None of the other
> language indexers seem to be. What does this feature add?
> (2)     With the German Analyzer, wildcard searches containing extended
> German characters do not seem to work. So, a* is fine but anä* or ö*
> always find zero results.
> (3)     In a similar vein to (2), wildcard searches with escaped 
> special
> characters fail to find results. So a search for co\-operative works 
> but
> a search for co\-op* fails.
>
> I will be grateful for any light that can be shed on these problems.
>
> With Thanks,
>
> Jon.
>
> Jon Humble
> BSc (hons,)
> Software Engineer
> eMail: jon.humble@tecsphere.com
>
> TecSphere Ltd
> Centre for Advanced Industry
> Coble Dene, Royal Quays
> Newcastle upon Tyne NE29 6DE
> United Kingdom
>
> Direct Dial: +44 (191) 270 31 06
> Fax: +44 (191) 270 31 09
> http://www.tecsphere.com
>
>
>
>
>
>
> *** Aktuelle Veranstaltungen der XCOM AG ***
>
> XCOM laedt ein zur IBM Workplace Roadshow in Berlin (02.03.2005)
> Anmeldung und Information unter http://lotus.xcom.de/events
>
> Workshop-Reihe "Mobilisierung von Lotus Notes Applikationen"  in 
> Berlin (05.03.2005)
> Anmeldung und Information unter http://lotus.xcom.de/events
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist 
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. 
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail 
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich 
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the sole 
> use of the intended recipient. Any review, distribution by others or 
> forwarding without express permission is strictly prohibited. If you 
> are not the intended recipient, please contact the sender and delete 
> all copies.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message