lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Snowball and accents filter...?
Date Fri, 27 Apr 2007 23:59:19 GMT

: In order to do this, we tried subclassing the SnowballAnalyzer... it
: doesn't work yet, though. Here is the code of our custom class:

At first glance, what youv'e got seems fine, can you elaborate on what you
mean by "it doesn't work" ?

Perhaps the issue is that the SnowballStemmer can't handle the accented
characters, and you should strip them first, then stem?

  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    if (stopSet != null)
      result = new StopFilter(result, stopSet);
    result = new ISOLatin1AccentFilter(result);
    result = new SnowballFilter(result, name);
    return result;


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message