lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc Guillemot" <m...@amath.net>
Subject Re: Help for german queries
Date Wed, 02 Oct 2002 13:03:28 GMT
Great, your stemmer does the job I expected for Umlaut. Thanks.

Has someone an idea for composed words ("betreuung" is not found in a doc
containing "Kundenbetreuung")?

Marc.



----- Original Message -----
From: "Clemens Marschner" <cmad@lanlab.de>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, October 02, 2002 2:36 PM
Subject: Re: Help for german queries


> Hm sorry I don't have the time right now, but I think it took me 10
minutes
> to discover the location where I had to do the changes.
> I thought ä=ae would already be included.
> I included my GermanStemmer version in this post. Sorry I can't do
> CVSing/diff'ing at the moment.
> The stemmer does ä->a and ae->a and doesn't distinguish between uppercase
> and lowercase. I'm not a linguist, so I can't say if it does overstemming.
I
> commented out the expression below
>
>      // "t" occurs only as suffix of verbs.
>      else if ( buffer.charAt( buffer.length() - 1 ) == 't' /*&&
> !uppercase*/ ) {
>   buffer.deleteCharAt( buffer.length() - 1 );
>      }
>      else {
>   doMore = false;
>      }
>
> Hope that helps
>
> Clemens
>
> ----- Original Message -----
> From: "Marc Guillemot" <marc@amath.net>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, October 02, 2002 12:47 PM
> Subject: Re: Help for german queries
>
>
> > The problem/question is not on the first letter case because but only on
> the
> > equivalence between "ä" and "ae" for instance.
> >
> > in my tests, searching for:
> > - Geschäft -> 13 results
> > - geschäft -> 0 result
> > - Geschaeft -> 0 result
> > - geschaeft -> 0 result
> >
> > Marc.
> >
> >
> > ----- Original Message -----
> > From: "Clemens Marschner" <cmad@lanlab.de>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Tuesday, October 01, 2002 1:16 PM
> > Subject: Re: Help for german queries
> >
> >
> > > there's a "feature" in the German stemmer (I would call it a bug) that
> > > treats words ending with "t" differently if they start with a capital
or
> > > non-capital letter. Are you sure you didn't type "geschäft" and
> > "Geschaeft"?
> > > Cause that's supposedly stemmed differently.
> > >
> > > --Clemens
> > >
> > > ----- Original Message -----
> > > From: "Marc Guillemot" <marc@amath.net>
> > > To: <lucene-user@jakarta.apache.org>
> > > Sent: Tuesday, October 01, 2002 9:40 AM
> > > Subject: Help for german queries
> > >
> > >
> > > > Hi,
> > > >
> > > > I've performed some tests with Lucene for german indexation/search
but
> I
> > > > don't get the results I expected:
> > > >
> > > > - Umlaut:
> > > > search for:
> > > >     - "Geschäft" -> x results
> > > >     - "Geschaeft" -> no result
> > > > Is there an option in the standard german classes to make the 2
> searches
> > > > above equivalent?
> > > >
> > > > - Composed words:
> > > > "betreuung" is not found in a doc containing "Kundenbetreuung"
> > > >
> > > > Any suggestions?
> > > >
> > > > Marc.


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message