lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: search with accent not match
Date Wed, 06 Aug 2008 12:27:18 GMT
Check out org.apache.lucene.analysis.ISOLatin1AccentFilter

It will strip diacritics - just be sure to use it at index time and 
query time to get what you want. Also, you will no longer be able to 
differentiate between the two in your searching (rarely that important 
in my opinion, but others certainly disagree).

- Mark

Christophe from paris wrote:
> Hello
>
> I'm use FrenchAnalyzer for index 
>
> IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
> true);
> Document = new Document();
> doc.add(new
> Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
> writer.addDocument(doc);
>
> And search
>
> IndexReader reader = IndexReader.open(pathOfIndex);			
> Searcher searcher = new IndexSearcher(reader);
> Analyzer analyzer = new FrenchAnalyzer();						
> QueryParser parser = new QueryParser(field, analyzer);					
> Query query = parser.parse(motRecherche);
> Hits hits = searcher.search(query);
>
> in my document i have the word "lumiere" and "lumière"
>
> when i search lumière only document match lumière but "lumiere" is not
> return
>
> and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
> but not lumière
>
> for a total match i must search "lumiere OR limière"
> but is not the best solution 
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message