lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan O'Connor" <jonathan.ocon...@xcom.de>
Subject Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]
Date Tue, 01 Mar 2005 14:07:51 GMT
Apologies Erik,
This must be one of those apostrophe in email address problems I always 
get. Recently I removed the apostrophe from the email address I give out.
Our server recognizes both email addresses, but some of these mail lists 
don't like the O'Connor clann!
Ciao,
Jonathan O'Connor
XCOM Dublin



Erik Hatcher <erik@ehatchersolutions.com> 
01/03/2005 12:16
Please respond to
"Lucene Users List" <lucene-user@jakarta.apache.org>


To
"Lucene Users List" <lucene-user@jakarta.apache.org>
cc

Subject
Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]






I had to moderate both Jonathan and Jon's messages in to the list. 
Please subscribe to the list and post to it with the address you've 
subscribed.  I cannot always guarantee I'll catch moderation messages 
and send them through in a timely fashion.

                 Erik

On Mar 1, 2005, at 6:18 AM, Jonathan O'Connor wrote:

> Jon,
> I too found some problems with the German analyser recently. Here's 
> what
> may help:
> 1. You can try reading Joerg Caumanns' paper "A Fast and Simple 
> Stemming
> Algorithm for German Words". This paper describes the algorithm
> implemented by GermanAnalyser.
> 2. I guess German nouns all capitalized, so maybe that's why. Although 
> you
> would want to be indexing well written German and not emails or text
> messages!
> 3. The German Stemmer converts umlauts into some funny form (the code 
> is a
> bit tricky, and I didn't spend any time looking at it), so maybe thats 
> why
> you can't find umlauts properly. I think the main reason for this 
> umlaut
> change is that many plurals are formed by umlauting: E.g. Haus, Haeuser
> (that ae is a umlaut).
>
> Finally, to really understand what's happening, get your hands on 
> Luke. I
> just got it last week, and its brilliant. It shows you everything about
> your indexes. You can also feed text to an Analyser, and see what it 
> makes
> of it. This will show you the real reason why your umlaut search is
> failing.
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
>
>
>
> "Jon Humble" <jon.humble@tecsphere.com>
> 01/03/2005 09:35
> Please respond to
> "Lucene Users List" <lucene-user@jakarta.apache.org>
>
>
> To
> <lucene-user@jakarta.apache.org>
> cc
>
> Subject
> Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]
>
>
>
>
>
>
> Hello,
>
> We?re using the GermanAnalyzer/Stemmer to index/search our (German)
> Website.
> I have a few questions:
>
> (1)     Why is the GermanAnalyzer case-sensitive? None of the other
> language indexers seem to be. What does this feature add?
> (2)     With the German Analyzer, wildcard searches containing extended
> German characters do not seem to work. So, a* is fine but anä* or ö*
> always find zero results.
> (3)     In a similar vein to (2), wildcard searches with escaped 
> special
> characters fail to find results. So a search for co\-operative works 
> but
> a search for co\-op* fails.
>
> I will be grateful for any light that can be shed on these problems.
>
> With Thanks,
>
> Jon.
>
> Jon Humble
> BSc (hons,)
> Software Engineer
> eMail: jon.humble@tecsphere.com
>
> TecSphere Ltd
> Centre for Advanced Industry
> Coble Dene, Royal Quays
> Newcastle upon Tyne NE29 6DE
> United Kingdom
>
> Direct Dial: +44 (191) 270 31 06
> Fax: +44 (191) 270 31 09
> http://www.tecsphere.com
>
>
>
>
>
>
> *** Aktuelle Veranstaltungen der XCOM AG ***
>
> XCOM laedt ein zur IBM Workplace Roadshow in Berlin (02.03.2005)
> Anmeldung und Information unter http://lotus.xcom.de/events
>
> Workshop-Reihe "Mobilisierung von Lotus Notes Applikationen"  in 
> Berlin (05.03.2005)
> Anmeldung und Information unter http://lotus.xcom.de/events
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist 
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. 
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail 
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich 
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the sole 
> use of the intended recipient. Any review, distribution by others or 
> forwarding without express permission is strictly prohibited. If you 
> are not the intended recipient, please contact the sender and delete 
> all copies.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





*** Aktuelle Veranstaltungen der XCOM AG ***

XCOM laedt ein zur IBM Workplace Roadshow in Berlin (02.03.2005)
Anmeldung und Information unter http://lotus.xcom.de/events

Workshop-Reihe "Mobilisierung von Lotus Notes Applikationen"  in Berlin (05.03.2005) 
Anmeldung und Information unter http://lotus.xcom.de/events


*** XCOM AG Legal Disclaimer ***

Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein für den Gebrauch
durch den vorgesehenen Empfaenger bestimmt. Dritten ist das Lesen, Verteilen oder Weiterleiten
dieser E-Mail untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich vollstaendig
zu loeschen und uns eine Nachricht zukommen zu lassen.

This email may contain material that is confidential and for the sole use of the intended
recipient. Any review, distribution by others or forwarding without express permission is
strictly prohibited. If you are not the intended recipient, please contact the sender and
delete all copies.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message