lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: Correction: Wildcard and Fuzzy queries in GermanAnalyzer
Date Mon, 24 Feb 2003 14:28:05 GMT
On Monday 24 February 2003 05:22, Volker Luedeling wrote:
> I made a small mistake in my example. My application converted all
> characters to lowercase while indexing. When I comment this out,
> "Etagenwohnung" remains unchanged after stemming. So, my example is bad.
> However, the basic problem remains (at least for all words that do not
> start with a capital letter). Take a word like "genaugenommen", for
> example. It will be stemmed to "nomm", and no real fuzzy or wildcard
> evaluation is possible.

Yes, there has been discussion about this problem lately. You may want to read 
mailing list archives to see some of the discussed problems in finding a good 
general solution... (brief summary: it's likely that no one solution can work 
100% reliably, depending on language of content, and on body of wild-card 
term used etc. etc)

It is fortunately fairly easy (after the patches especially) to create your 
own query parser, extending default one. In that parser you can use an 
analyzer on wildcard queries too. The only change you have to do to default 
analyzer(s) is to make sure that wildcards remain in query term, ie. '*' and 
'?' are not removed, and that these chars do not confuse stemmer (may not be 
trivial to do, actually?)

Hope this helps,

-+ Tatu +-



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message