lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Brown" <brian.br...@wanadoo.fr>
Subject Fw: Filter and stop-words
Date Mon, 03 Dec 2001 17:05:45 GMT
I have just started work on a French stemmer for Lucene. My approach is
to use the German stemmer contributed by Gerhard Schwarz as a starting
point for the structure of the modules (thank you, Gerhard). I notice that
at
least one other person is working on a French stemmer (Stephane Ruffieux).
Should we consider collaboration?

Brian Brown
----- Original Message -----
From: "Ruffieux Stephane, yellowworld extern" <ruffieuxs@yellowworld.ch>
To: <lucene-user@jakarta.apache.org>
Sent: Monday, December 03, 2001 1:33 PM
Subject: WG: Filter and stop-words


Yes, it is a way. You have to implement
org.apache.lucene.analysis.TokenFilter and to implement the next() method
with the algorithm for Portugese. After that you can create a
PortugeseAnalyser (inherits from Analyser) which use it.

In the actual version of Lucene it's an implementation for german. It can
perhaps help you.

The following link gives algorithms for some european langages:
http://snowball.sourceforge.net/texts/stemmersoverview.html
<http://snowball.sourceforge.net/texts/stemmersoverview.html>

Best regards.

Stéphane

PS: I work on the same problem, but for french and italian. Had somebody
still programming stemmer for this languages?

-----Ursprüngliche Nachricht-----
Von: Bizu de Anúncio [SMTP:atendimento@bizudeanuncio.com]
<mailto:[SMTP:atendimento@bizudeanuncio.com]>
Gesendet am: Montag, 3. Dezember 2001 13:22
An: lucene-user@jakarta.apache.org
<mailto:lucene-user@jakarta.apache.org>
Betreff: Filter and stop-words

I'm new to Lucene. First of all I would like to know if there is a
search
arquive like "sun servlets list".

My first problem is that I want to index a Portuguese database and I
need
to remove the "s" (plural) and acents (à é ...) from the words. Is there a
way of passing a filter class to the Lucene indexer ? And about the
stop-words, where should I configure Lucene to ignore it ?

Any help would be appreciated,

thanks a lot,

jk


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org> >
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org> >

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message