lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christoph Kiehl" ...@sulu3000.de>
Subject Re: Problem with tokenizing/stemming in GermanAnalyzer
Date Mon, 17 Feb 2003 15:51:06 GMT
Hi Volker,

> I have noticed a strange problem with capitalization. Search for
> "computer" results in the token "compu". Search for "Computer",
> however, results in "comput". The search is supposed to be
> case-insensitive, so this must be a bug, right?

This problem was already mentioned on the developer list. The analyzer tries
to do some noun recognition. But it does a bad job ;)

For now you could check out the current lucene version from cvs and just
comment out the following line:

 uppercase = Character.isUpperCase( term.charAt( 0 ) );

Then just run ant to built the jar. This fixes the problem you described.

Regards
Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message