lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Spat <stephan.s...@joanneum.at>
Subject Q: Wildcard searching with german umlauts (ä, ö, ß, ...)
Date Mon, 20 Nov 2006 12:46:22 GMT
Hello again!

I use the following Analyzer to analyze my documents:

public TokenStream tokenStream(String fieldName, Reader reader) {
                return new SnowballFilter(
                    new LowerCaseFilter(
                        new StandardFilter(
                            new StandardTokenizer(reader))), "German");   
    }

It replaces german umlauts, e.g. ä <=> a, ü <=> u, ... . So no umlauts 
are in the index. For searching I use the same Analyzer. When I do a 
simple search for a word with umlauts there is no problem. But if I use 
addidionally wildcards I suppose the word is not analyzed and so I word 
with umlauts and wildcards is not found in the index?!! (for example: 
grö*). Is this assumption correct?

Is the only way to use wildcards and umlauts not to use the 
StadardFilter (I suppose replacement is done here)? Or is there a 
"trick" to use umlauts and wildcards? Or is it necessary to write a new 
Filter instead of the StandardFilter?

Thank's a lot

Stephan Spat


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message