lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Øie <k...@gan.no>
Subject RE: Strange Results with German Analyzer
Date Thu, 20 Dec 2001 11:54:16 GMT
take a look at the end of GermanAnalyzer.java

http://cvs.apache.org/viewcvs/jakarta-lucene/src/java/org/apache/lucene/anal
ysis/de/GermanAnalyzer.java?rev=1.2&content-type=text/vnd.viewcvs-markup


	public final TokenStream tokenStream( String fieldName, Reader reader ) {
		TokenStream result = new StandardTokenizer( reader );
		result = new StandardFilter( result );
		result = new StopFilter( result, stoptable );
		result = new GermanStemFilter( result, excltable );
		// Convert to lowercase after stemming!
		result = new LowerCaseFilter( result );
		return result;
	}

as you can see the analyzer converts all words to lowercase to save some
space, you can ofcourse remove the LowerCaseFilter) to get case sensetive
search. the reason why holland gives 1 and hollAnd returns 22 i can not
say...

mvh karl øie



-----Original Message-----
From: Jan Stövesand [mailto:j.stoevesand@finix.de]
Sent: 20. desember 2001 12:36
To: Lucene Users List
Subject: Strange Results with German Analyzer


Hi,

I used a German Analyzer for Indexing and Searching. afaik, the search is
case insensitive. At least I get the same searchresults for

kapitalanlagen
Kapitalanlagen

But, for some words the Analyzer behaves somewhat funny:

Holland -> 22 results
hollAnd -> 22 results
hollanD -> 22 results
HOLLAND -> 22 results

holland -> 1 result (!) which is NOT in the 22 results mentioned above.

I have no idea and my knowledge about Searching, stemming, indexing etc is,
well, small.

Jan


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message