lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: German*Filter, Analyzer "cutting" off letters from (french) words...
Date Mon, 18 Apr 2011 11:35:39 GMT
You can easily string together your own tokenizer and any number of filters
to create an analyzer that does exactly what you need. Lucene In Action
shows an example for creating your own analyzer by assembling
the standard parts....

Best
Erick

On Mon, Apr 18, 2011 at 3:08 AM, Clemens Wyss <clemensdev@mysign.ch> wrote:
> What is the best way to "avoid" the lowercasing (and still being able to exclude stop
words)?
>
>> -----Ursprüngliche Nachricht-----
>> Von: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Gesendet: Freitag, 15. April 2011 08:56
>> An: java-user@lucene.apache.org
>> Betreff: Re: German*Filter, Analyzer "cutting" off letters from (french)
>> words...
>>
>> On Fri, Apr 15, 2011 at 8:48 AM, Clemens Wyss <clemensdev@mysign.ch>
>> wrote:
>> > Does the StandardAnalyzer lowercase its terms?
>> yes!
>>
>> simon
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: Clemens Wyss [mailto:clemensdev@mysign.ch]
>> >> Gesendet: Mittwoch, 13. April 2011 13:34
>> >> An: java-user@lucene.apache.org
>> >> Betreff: AW: German*Filter, Analyzer "cutting" off letters from
>> >> (french) words...
>> >>
>> >> This is what I was looking for, thanks
>> >>
>> >> > -----Ursprüngliche Nachricht-----
>> >> > Von: Robert Muir [mailto:rcmuir@gmail.com]
>> >> > Gesendet: Mittwoch, 13. April 2011 12:11
>> >> > An: java-user@lucene.apache.org
>> >> > Betreff: Re: German*Filter, Analyzer "cutting" off letters from
>> >> > (french) words...
>> >> >
>> >> > If you only want to ignore german stopwords, you don't need to use
>> >> > the german analyzer with german stemming. you can just use
>> >> > StandardAnalyzer with your own stopwords set!
>> >> >
>> >> > On Wed, Apr 13, 2011 at 3:51 AM, Clemens Wyss
>> >> <clemensdev@mysign.ch>
>> >> > wrote:
>> >> > > What I really want to do is ignore german stop words such as
>> >> > > "der", "die",
>> >> > "das", "ein",...
>> >> > >
>> >> > >> -----Ursprüngliche Nachricht-----
>> >> > >> Von: Robert Muir [mailto:rcmuir@gmail.com]
>> >> > >> Gesendet: Dienstag, 12. April 2011 17:03
>> >> > >> An: java-user@lucene.apache.org
>> >> > >> Betreff: Re: German*Filter, Analyzer "cutting" off letters
from
>> >> > >> (french) words...
>> >> > >>
>> >> > >> On Tue, Apr 12, 2011 at 8:46 AM, Clemens Wyss
>> >> > <clemensdev@mysign.ch>
>> >> > >> wrote:
>> >> > >> > Why so? Where have the e's gone?
>> >> > >> >
>> >> > >>
>> >> > >> the e is being stemmed as its a german suffix... all of the
>> >> > >> german stemming algorithms remove final -e, as do all the
french
>> >> > >> stemming
>> >> > algorithms.
>> >> > >>
>> >> > >> so i don't understand your problem.
>> >> > >>
>> >> > >> ----------------------------------------------------------------
>> >> > >> ---
>> >> > >> -- To unsubscribe, e-mail:
>> >> > >> java-user-unsubscribe@lucene.apache.org
>> >> > >> For additional commands, e-mail:
>> >> > >> java-user-help@lucene.apache.org
>> >> > >
>> >> > >
>> >> >
>> >> > -------------------------------------------------------------------
>> >> > -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message