lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clemens Marschner" <c...@lanlab.de>
Subject Re: Help for german queries
Date Wed, 02 Oct 2002 12:36:29 GMT
Hm sorry I don't have the time right now, but I think it took me 10 minutes
to discover the location where I had to do the changes.
I thought ä=ae would already be included.
I included my GermanStemmer version in this post. Sorry I can't do
CVSing/diff'ing at the moment.
The stemmer does ä->a and ae->a and doesn't distinguish between uppercase
and lowercase. I'm not a linguist, so I can't say if it does overstemming. I
commented out the expression below

     // "t" occurs only as suffix of verbs.
     else if ( buffer.charAt( buffer.length() - 1 ) == 't' /*&&
!uppercase*/ ) {
  buffer.deleteCharAt( buffer.length() - 1 );
     }
     else {
  doMore = false;
     }

Hope that helps

Clemens

----- Original Message -----
From: "Marc Guillemot" <marc@amath.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, October 02, 2002 12:47 PM
Subject: Re: Help for german queries


> The problem/question is not on the first letter case because but only on
the
> equivalence between "ä" and "ae" for instance.
>
> in my tests, searching for:
> - Geschäft -> 13 results
> - geschäft -> 0 result
> - Geschaeft -> 0 result
> - geschaeft -> 0 result
>
> Marc.
>
>
> ----- Original Message -----
> From: "Clemens Marschner" <cmad@lanlab.de>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, October 01, 2002 1:16 PM
> Subject: Re: Help for german queries
>
>
> > there's a "feature" in the German stemmer (I would call it a bug) that
> > treats words ending with "t" differently if they start with a capital or
> > non-capital letter. Are you sure you didn't type "geschäft" and
> "Geschaeft"?
> > Cause that's supposedly stemmed differently.
> >
> > --Clemens
> >
> > ----- Original Message -----
> > From: "Marc Guillemot" <marc@amath.net>
> > To: <lucene-user@jakarta.apache.org>
> > Sent: Tuesday, October 01, 2002 9:40 AM
> > Subject: Help for german queries
> >
> >
> > > Hi,
> > >
> > > I've performed some tests with Lucene for german indexation/search but
I
> > > don't get the results I expected:
> > >
> > > - Umlaut:
> > > search for:
> > >     - "Geschäft" -> x results
> > >     - "Geschaeft" -> no result
> > > Is there an option in the standard german classes to make the 2
searches
> > > above equivalent?
> > >
> > > - Composed words:
> > > "betreuung" is not found in a doc containing "Kundenbetreuung"
> > >
> > > Any suggestions?
> > >
> > > Marc.
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>

Mime
View raw message