lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ernesto De Santis" <ernesto.desan...@colaborativa.net>
Subject Re: spanish stemmer
Date Mon, 23 Aug 2004 20:31:03 GMT
Because the SnowballAnalyzer, and SpanishStemmer don´t have a default
stopword set.

SnowballAnalyzer constructor:

  /** Builds the named analyzer with no stop words. */
  public SnowballAnalyzer(String name) {
    this.name = name;
  }

Note the comment.

Bye,
Ernesto.

----- Original Message ----- 
From: "Chad Small" <Chad.Small@definityhealth.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, August 23, 2004 4:57 PM
Subject: RE: spanish stemmer


Excellent Ernesto.

Was there a reason you used your own stop word list and not just the default
constructor SnowballAnalyzer("Spanish")?

thanks,
chad.

-----Original Message-----
From: Ernesto De Santis [mailto:ernesto.desantis@colaborativa.net]
Sent: Monday, August 23, 2004 2:03 PM
To: Lucene Users List
Subject: Re: spanish stemmer


Yes, is too easy.

You need do a wrapper for spanish Snowball initilization.

analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);

above the complete code.

Bye, Ernesto.


--------------------------------------------------
public class SpanishAnalyzer extends Analyzer {

private static SnowballAnalyzer analyzer;


private String SPANISH_STOP_WORDS[] = {

"un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras",
"otro", "algún", "alguno", "alguna",

"algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy",
"esta", "estamos", "estais",

"estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba",
"ante", "antes", "siendo",

"ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis",
"pueden", "fui", "fue", "fuimos",

"fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada",
"fin", "incluso", "primero",

"desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos",
"consiguen", "ir", "voy", "va",

"vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene",
"tenemos", "teneis", "tienen",

"el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos",
"ellas", "nos", "nosotros", "vosotros",

"vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe",
"sabemos", "sabeis", "saben",

"ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas",
"sus", "entonces", "tiempo",

"verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta",
"ciertas", "intentar", "intento",

"intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo",
"arriba", "encima", "usar",

"uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo",
"empleas", "emplean", "ampleamos",

"empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien",
"cual", "cuando", "donde",

"mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar",
"trabajas", "trabaja", "trabajamos",

"trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian",
"podriais", "yo", "aquel", "mi",

"de", "a", "e", "i", "o", "u"};

public SpanishAnalyzer() {

analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);

}

public SpanishAnalyzer(String stopWords[]) {

analyzer = new SnowballAnalyzer("Spanish", stopWords);

}

public TokenStream tokenStream(String fieldName, Reader reader) {

return analyzer.tokenStream(fieldName, reader);

}

}



----- Original Message ----- 
From: "Chad Small" <Chad.Small@definityhealth.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, August 23, 2004 3:49 PM
Subject: RE: spanish stemmer


Do you mind sharing how you implemented your SpanishAnalyzer using Snowball?

Sorry I can't help with your question.  I am trying to implement Snowball
Spanish or a Spanish Analyzer in Lucene.

thanks,
chad.

-----Original Message-----
From: Ernesto De Santis [mailto:ernesto.desantis@colaborativa.net]
Sent: Monday, August 23, 2004 8:30 AM
To: Lucene Users List
Subject: spanish stemmer


Hello

I use the Snowball jar for implement my SpanishAnalyzer. I found that the
words finished in 'bol' are not stripped.
For example:

In spanish for say basketball, you can say basquet or basquetbol. But for
SpanishStemmer are different words.
Idem with voley and voleybol.

Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t
exist in spanish.

you think that I are correct?

you can change this?

Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message