lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: search with accent not match
Date Wed, 06 Aug 2008 13:30:01 GMT
You certainly can - just create your own Analyzer starting with a copy 
of the French one you are using.

Then you just plug in the filter in the order you want it applied:

result = new ISOLatin1AccentFilter(result);

You have to decide for yourself where it will come - if you put it 
before the stopword step, more stops words might be removed than if it 
was after - that type of thing usually comes down to individual 
requirements/filter limitations. If your stopword list has diacriticals 
and you run the accent filter before applying the stopword list, some 
expected stopwords will never be removed...etc.


Christophe from paris wrote:
> Actualy in my FrenchAnalyser 
>
> i have :
>
>  TokenStream result = new StandardTokenizer(reader);
>     result = new StandardFilter(result);
>     result = new StopFilter(result, stoptable);
>     result = new FrenchStemFilter(result, excltable);
>     result = new LowerCaseFilter(result);
>
>
> I can use ISOLatin1AccentFilter in this Class for indexing ans search ?
> And it is the case where ?
>
>
> markrmiller wrote:
>   
>> Check out org.apache.lucene.analysis.ISOLatin1AccentFilter
>>
>> It will strip diacritics - just be sure to use it at index time and 
>> query time to get what you want. Also, you will no longer be able to 
>> differentiate between the two in your searching (rarely that important 
>> in my opinion, but others certainly disagree).
>>
>> - Mark
>>
>> Christophe from paris wrote:
>>     
>>> Hello
>>>
>>> I'm use FrenchAnalyzer for index 
>>>
>>> IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
>>> true);
>>> Document = new Document();
>>> doc.add(new
>>> Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
>>> writer.addDocument(doc);
>>>
>>> And search
>>>
>>> IndexReader reader = IndexReader.open(pathOfIndex);			
>>> Searcher searcher = new IndexSearcher(reader);
>>> Analyzer analyzer = new FrenchAnalyzer();						
>>> QueryParser parser = new QueryParser(field, analyzer);					
>>> Query query = parser.parse(motRecherche);
>>> Hits hits = searcher.search(query);
>>>
>>> in my document i have the word "lumiere" and "lumière"
>>>
>>> when i search lumière only document match lumière but "lumiere" is not
>>> return
>>>
>>> and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
>>> but not lumière
>>>
>>> for a total match i must search "lumiere OR limière"
>>> but is not the best solution 
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message