lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dora <julien.bar...@gmail.com>
Subject Re: Indexing accented characters, then searching by any form
Date Tue, 25 Nov 2008 15:21:30 GMT



Karl Wettin wrote:
> 
> Try this (dry coded) snippet instead:
> 
> StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
>    public TokenStream tokenStream(String fieldName, Reader reader) {
>      return new ISOLatin1AccentFilter(super.tokenStream(fieldName,  
> reader));
>    }
> }
> 

I tried this, but it does not work as expected.

I am using an utility class with a static method that gives me an analyzer:

public static Analyzer getAnalyzer() 
	{  
		StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
			   public TokenStream tokenStream(String fieldName, Reader reader) {
			     return new ISOLatin1AccentFilter(super.tokenStream(fieldName,
reader));
			   }
			};
			return objAnalyzer;
		}
	}

So when I need the analyzer (for indexing or searching) I perform an
UtilityClass.getAnalyzer() call.

It works for my query parser: The accent are correctly removed when
performing the search.
If my index contains "cafe" searching for "café" will find the documents
containing "cafe"

But when explore my index with Luke I can see that the indexer does not use
the ISOLatin1AccentFilter  (I tested with a breakpoint in the overriden
tokenStream method) and if the document contains "café", the index will
contain "café".

As a consequence, search on word having accent is not possible: the index
contains the accent, while it is removed by the search process.

So my index contains "café", but when I search for "café" the filter changes
it in "cafe" and it gives no hit...

Any clue on why my filter is not used at time of indexation ?




-- 
View this message in context: http://www.nabble.com/Indexing-accented-characters%2C-then-searching-by-any-form-tp15412778p20682548.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message