lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: a question for french analyzer
Date Mon, 30 Jul 2007 18:18:06 GMT
Gosh, I sure hope not, because that would mean that we rolled our
own for no good reason. We wound up just collapsing
the input stream by substituting plain old 'e' for all the accented
variants before indexing and before searching. Be *really* careful
what character set you're using.

Actually, we would have still had to roll our own because the
character mapping was...er...wonky <G>....

You have to store the data raw for display purposes if you want the
accents to show though...

Best
Erick

On 7/30/07, Chris Lu <chris.lu@gmail.com> wrote:
>
> Hi,
>
> I am not a French speaker, but here are some questions regarding
> French analyzer:
>
> Is there any analyzer that can do this? Analyze accentuated letters to
> non accentuated corresponding letters (é,è,ê,ë -> e), so that
>
> search "fenêtre" (=window) found all docs with "fenêtre" or "fenetre"
> and
> search "fenetre" found the same result, all docs with "fenêtre" or
> "fenetre"
>
> Current analyzers, Snowball-French and FrenchAnalyzer don't have this
> feature.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message