lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Filtering accents
Date Tue, 30 Dec 2008 15:35:33 GMT
You might want to take a look at using the ISOLatinAccentFilter or similar
at
both index and query time. It basically folds accented characters into their
un-accented form.

Matthew:

You wrote:
<<<so I would suggest adding a second stored, but unindexed field that
stores
the original value >>>

I also did this before realizing that the second field is unnecessary.
Storing is
orthogonal to indexing. That is, the filters are NOT applied to the stored
data.
>From the docs for Field.Store.YES (emphasis mine)

<<<Store the original field value in the index. This is useful for short
texts like a
document's title which should be displayed with the results. The value is
stored
in its original form, *i.e. no analyzer is used before it is stored*.>>>

I don't think it makes much/any real performance difference, but it does
make the
code simpler to use just one field...

Best
Erick


On Tue, Dec 30, 2008 at 8:52 AM, legrand thomas <thomaslegrand14@yahoo.fr>wrote:

> Dear all,
>
> I'd like my lucene searches to be insensitive to (French) accents. For
> example, considering a indexed term "métal", I want to get it when searching
> for "metal" or "métal" . I use lucene-2.3.2 and the searches are performed
> with: IndexSearcher.search(query,filter,sorter), Another filter is already
> used together with a "Sort" object. Futrhermore, I cannot use the
> FrenchAnalyzer as my index does not only contain French words.
>
> Can anybody help ?
> Thanks in advance,
> Tom
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message