lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Volker Luedeling" <v.luedel...@brox.de>
Subject Correction: Wildcard and Fuzzy queries in GermanAnalyzer
Date Mon, 24 Feb 2003 12:22:23 GMT
I made a small mistake in my example. My application converted all
characters to lowercase while indexing. When I comment this out,
"Etagenwohnung" remains unchanged after stemming. So, my example is bad.
However, the basic problem remains (at least for all words that do not
start with a capital letter). Take a word like "genaugenommen", for
example. It will be stemmed to "nomm", and no real fuzzy or wildcard
evaluation is possible.

-----Original Message-----
From: Volker Luedeling 
Sent: Montag, 24. Februar 2003 11:58
To: lucene-user@jakarta.apache.org
Subject: Wildcard and Fuzzy queries in GermanAnalyzer

Hi,

I have noticed that FuzzyQueries and WildcardQueries don't do stemming.
Since all terms in the index are in stemmed forms, this causes some
problems:

"Etagenwohnung" gets stemmed to "nwohnung". So a search for
"Etagenwohnung will find "Etagenwohnung" and "nwohnung".

Fuzzy search for "Etagenwohnung~" finds neither of those, but it will
find "Nebenwohnung".

Wildcard search for "Etagenw?hnung" also finds neither of the two
documents, while "nwoh*" finds "Etagenwohnung", which is also not what a
user would expect.

It seems that stemming has a fundamental problem with these kinds of
tolerant searches. Does anyone know how to resolve these issues? Do you
plan to tackle this problem in a future release of Lucene?

Volker


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message