lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Sevigny" <sevi...@ajlsm.com>
Subject RE : [PATCH] Refactoring QueryParser.jj, setLowercaseWildcardTerms()
Date Wed, 12 Feb 2003 23:33:38 GMT
Hi,

> > Also, I think we should lowercase prefix and wildcard queries by 
> > default.  This would fix one of the most frequently 
> reported problems. 
> > Yes, it might also break folks who currently do case-sensitive 
> > wildcard queries, but I suspect they are far fewer than 
> those who will 
> > continue to complain about the default case-sensitivity of wildcard 
> > searches. What do others think?
> 
> For the StandardAnalyzer this might work, but for the 
> GermanAnalyzer, there is also the problem with Umlauts 
> (ä,ö,ü) turned into vowels (a,o,u) while indexing. An 
> example: "Häuser" is the plural of "Haus". If I index 
> "Häuser" it is stemmed to "hau". If I do for example a search 
> for "häus*" nothing is found, because "häus" is not stemmed. 
> If I would analyze "häus*" I should get "hau*". The problem 
> is, that now you do not only get "Häuser" but also "Haus" as 
> result. But I think it is better to get more results than no 
> result. This is perhaps a special problem with the 
> GermanAnalyzer. May be there could be an option to use the 
> Analyzer also for wildcard queries. So I can turn it on in my 
> case and defaults to off. Hope you understand my problem ;)

I second that, it is true for many languages where a "standard" analyzer
will most of the time do more than removing uppercase, it will remove
"diacritics" like in the above example. Along with possibly stemming.

Lucene is a wonderful tool for building i18n-ready search engines, let's
not forget it ;-)

Martin Sévigny


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message