lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: Filtering accents
Date Tue, 30 Dec 2008 14:22:20 GMT
If you are constrained in such a way as to not use the French Analyzer 
you might instead consider transforming the input as an additional step 
at both search/indexing time.

Use something like a regex that looks for é and always replaces it with 
e in the index, and at search time.  (expand this transformation step as 
needed)

You likely also need to store the original word somewhere, so I would 
suggest adding a second stored, but unindexed field that stores the 
original value of the word, so when you match on your search criteria, 
you will also get the original form of the word in your hits object.

Hope this helps,

Matt

egrand thomas wrote:
> Dear all,
>
> I'd like my lucene searches to be insensitive to (French) accents. For example, considering
a indexed term "métal", I want to get it when searching for "metal" or "métal" . I use lucene-2.3.2
and the searches are performed with: IndexSearcher.search(query,filter,sorter), Another filter
is already used together with a "Sort" object. Futrhermore, I cannot use the FrenchAnalyzer
as my index does not only contain French words.
>
> Can anybody help ?
> Thanks in advance,
> Tom
>
>
>
>       
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message