lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruffieux Stephane, yellowworld extern" <ruffie...@yellowworld.ch>
Subject WG: Filter and stop-words
Date Mon, 03 Dec 2001 12:33:04 GMT
Yes, it is a way. You have to implement
org.apache.lucene.analysis.TokenFilter and to implement the next() method
with the algorithm for Portugese. After that you can create a
PortugeseAnalyser (inherits from Analyser) which use it. 

In the actual version of Lucene it's an implementation for german. It can
perhaps help you. 

The following link gives algorithms for some european langages:
http://snowball.sourceforge.net/texts/stemmersoverview.html
<http://snowball.sourceforge.net/texts/stemmersoverview.html> 

Best regards.

Stéphane

PS: I work on the same problem, but for french and italian. Had somebody
still programming stemmer for this languages?

-----Ursprüngliche Nachricht-----
Von:	Bizu de Anúncio [SMTP:atendimento@bizudeanuncio.com]
<mailto:[SMTP:atendimento@bizudeanuncio.com]> 
Gesendet am:	Montag, 3. Dezember 2001 13:22
An:	lucene-user@jakarta.apache.org
<mailto:lucene-user@jakarta.apache.org> 
Betreff:	Filter and stop-words

	I'm new to Lucene. First of all I would like to know if there is a
search
arquive like "sun servlets list".

	My first problem is that I want to index a Portuguese database and I
need
to remove the "s" (plural) and acents (à é ...) from the words. Is there a
way of passing a filter class to the Lucene indexer ? And about the
stop-words, where should I configure Lucene to ignore it ?

	Any help would be appreciated,

	thanks a lot,

		jk


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org
<mailto:lucene-user-unsubscribe@jakarta.apache.org> >
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org
<mailto:lucene-user-help@jakarta.apache.org> >

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message