lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Green <ndrw_...@yahoo.com.mx>
Subject Snowball and accents filter...?
Date Thu, 26 Apr 2007 19:46:28 GMT
Hi, all,

Another quick request succinct for code examples, or an explanation of
what we're doing wrong here.

We've successfully gotten the Snowball Spanish stemmer working in our
test harness. An example that works perfectly: texts that contain
"civilizaciĆ³n" or "civilizaciones" produce hits on searches for either
"civilizaciĆ³n" or "civilizaciones". However...

...it's quite likely that in many of our search requests the user will
omit the accents on letters, and it's not impossible that some documents
will contain misspelled words with wrong or missing accents.

So we in addition to stemming we need to remove accents from both the
index and the search queries... I think.

In order to do this, we tried subclassing the SnowballAnalyzer... it
doesn't work yet, though. Here is the code of our custom class:
        
        
        public class SnowballAnalyzerWithoutAccents extends SnowballAnalyzer {
        
                public SnowballAnalyzerWithoutAccents(String name, String[] stopWords) {
                        super(name, stopWords);
                }
        
                public TokenStream tokenStream(String fieldName, Reader reader) {
                        TokenStream result = super.tokenStream(fieldName, reader);
                        result = new ISOLatin1AccentFilter(result);
                        return result;
                }
        
        }
        
Basically we create an instance of this class and use it when we create
the IndexWriter and QueryParser objects.

Any tips/code examples? What should we do differently?

Many thanks in advance,
Andrew Green


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message