lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruffieux Stephane, yellowworld extern" <ruffie...@yellowworld.ch>
Subject WG: Filter and stop-words
Date Tue, 04 Dec 2001 07:23:16 GMT
I think that to start on the basis of the german stemmer is a good way. 

I'm sorry, Brian, but the priorities in our project were defined in the last
time and the search aspects aren't the first. So I can't help you a lot in
the following months.

Stephane Ruffieux

-----Ursprüngliche Nachricht-----
Von:	Brian Brown [SMTP:brian.brown@wanadoo.fr]
<mailto:[SMTP:brian.brown@wanadoo.fr]> 
Gesendet am:	Montag, 3. Dezember 2001 18:06
An:	lucene-user@jakarta.apache.org
<mailto:lucene-user@jakarta.apache.org> 
Betreff:	Fw: Filter and stop-words

I have just started work on a French stemmer for Lucene. My approach is
to use the German stemmer contributed by Gerhard Schwarz as a starting
point for the structure of the modules (thank you, Gerhard). I notice that
at
least one other person is working on a French stemmer (Stephane Ruffieux).
Should we consider collaboration?

Brian Brown
----- Original Message -----
From: "Ruffieux Stephane, yellowworld extern" <ruffieuxs@yellowworld.ch
<mailto:ruffieuxs@yellowworld.ch> >
To: <lucene-user@jakarta.apache.org <mailto:lucene-user@jakarta.apache.org>
>
Sent: Monday, December 03, 2001 1:33 PM
Subject: WG: Filter and stop-words


Yes, it is a way. You have to implement
org.apache.lucene.analysis.TokenFilter and to implement the next() method
with the algorithm for Portugese. After that you can create a
PortugeseAnalyser (inherits from Analyser) which use it.

In the actual version of Lucene it's an implementation for german. It can
perhaps help you.

The following link gives algorithms for some european langages:
http://snowball.sourceforge.net/texts/stemmersoverview.html
<http://snowball.sourceforge.net/texts/stemmersoverview.html> 
<http://snowball.sourceforge.net/texts/stemmersoverview.html
<http://snowball.sourceforge.net/texts/stemmersoverview.html> >

Best regards.

Stéphane

PS: I work on the same problem, but for french and italian. Had somebody
still programming stemmer for this languages?

-----Ursprüngliche Nachricht-----
Von: Bizu de Anúncio [SMTP:atendimento@bizudeanuncio.com]
<mailto:[SMTP:atendimento@bizudeanuncio.com]> 
<mailto:[SMTP:atendimento@bizudeanuncio.com]
<mailto:[SMTP:atendimento@bizudeanuncio.com]> >
Gesendet am: Montag, 3. Dezember 2001 13:22
An: lucene-user@jakarta.apache.org <mailto:lucene-user@jakarta.apache.org> 
<mailto:lucene-user@jakarta.apache.org
<mailto:lucene-user@jakarta.apache.org> >
Betreff: Filter and stop-words

I'm new to Lucene. First of all I would like to know if there is a
search
arquive like "sun servlets list".

My first problem is that I want to index a Portuguese database and I
need
to remove the "s" (plural) and acents (à é ...) from the words. Is there a
way of passing a filter class to the Lucene indexer ? And about the
stop-words, where should I configure Lucene to ignore it ?

Any help would be appreciated,

thanks a lot,

jk


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org> > >
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org> > >

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org> >
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org> >



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org> >
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org> >

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message