lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From thomas arni <a...@zhaw.ch>
Subject Re: Searching Diacritics
Date Mon, 27 Aug 2007 14:25:42 GMT
You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like 
this:

  /** Constructs a {@link StandardTokenizer} filtered by a {@link
  StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new ISOLatin1AccentFilter(result);
    return result;
  }


anorman wrote:
> This looks like exactly what I want.  Would I implement this along with
> another analyzer such as the standard or stand alone?  Does anyone have any
> code examples of implementing such a thing?
>
> Thanks,
> Albert
>
>
>
>
> karl wettin-3 wrote:
>   
>> 27 aug 2007 kl. 16.03 skrev anorman:
>>
>>     
>>> I have a searchable index of documents which contain french and  
>>> spanish
>>> diacritics (è, é, À) etc.  I would like to make the content  
>>> searchable so
>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>> (without diacritic) then it returns the same results.
>>>
>>> Has anyone set up something similar?
>>>       
>> ISOLatin1AccentFilter
>>
>> -- 
>> karl
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message