lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe from paris <zlink...@yahoo.fr>
Subject Re: search with accent not match
Date Thu, 07 Aug 2008 12:27:03 GMT

Yes  markrmiller,the order is important
then 

 TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);  
    result = new StopFilter(result, stoptable);    
    result = new ISOLatin1AccentFilter(result);
    result = new FrenchStemFilter(result, excltable);
    result = new LowerCaseFilter(result);

And finaly with ISOLatin1AccentFilter the result is good :)

tanks you.

Now go the polish search ^^


markrmiller wrote:
> 
> You certainly can - just create your own Analyzer starting with a copy 
> of the French one you are using.
> 
> Then you just plug in the filter in the order you want it applied:
> 
> result = new ISOLatin1AccentFilter(result);
> 
> You have to decide for yourself where it will come - if you put it 
> before the stopword step, more stops words might be removed than if it 
> was after - that type of thing usually comes down to individual 
> requirements/filter limitations. If your stopword list has diacriticals 
> and you run the accent filter before applying the stopword list, some 
> expected stopwords will never be removed...etc.
> 
> 
> Christophe from paris wrote:
>> Actualy in my FrenchAnalyser 
>>
>> i have :
>>
>>  TokenStream result = new StandardTokenizer(reader);
>>     result = new StandardFilter(result);
>>     result = new StopFilter(result, stoptable);
>>     result = new FrenchStemFilter(result, excltable);
>>     result = new LowerCaseFilter(result);
>>
>>
>> I can use ISOLatin1AccentFilter in this Class for indexing ans search ?
>> And it is the case where ?
>>
>>
>> markrmiller wrote:
>>   
>>> Check out org.apache.lucene.analysis.ISOLatin1AccentFilter
>>>
>>> It will strip diacritics - just be sure to use it at index time and 
>>> query time to get what you want. Also, you will no longer be able to 
>>> differentiate between the two in your searching (rarely that important 
>>> in my opinion, but others certainly disagree).
>>>
>>> - Mark
>>>
>>> Christophe from paris wrote:
>>>     
>>>> Hello
>>>>
>>>> I'm use FrenchAnalyzer for index 
>>>>
>>>> IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
>>>> true);
>>>> Document = new Document();
>>>> doc.add(new
>>>> Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
>>>> writer.addDocument(doc);
>>>>
>>>> And search
>>>>
>>>> IndexReader reader = IndexReader.open(pathOfIndex);			
>>>> Searcher searcher = new IndexSearcher(reader);
>>>> Analyzer analyzer = new FrenchAnalyzer();						
>>>> QueryParser parser = new QueryParser(field, analyzer);					
>>>> Query query = parser.parse(motRecherche);
>>>> Hits hits = searcher.search(query);
>>>>
>>>> in my document i have the word "lumiere" and "lumière"
>>>>
>>>> when i search lumière only document match lumière but "lumiere" is not
>>>> return
>>>>
>>>> and if search "lumiere" the result is lumiere, lumieres
>>>> ,lumiére,lumiéres
>>>> but not lumière
>>>>
>>>> for a total match i must search "lumiere OR limière"
>>>> but is not the best solution 
>>>>   
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/search-with-accent-not-match-tp18848522p18869247.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message