lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: a question for french analyzer
Date Mon, 30 Jul 2007 20:36:25 GMT
Hi, Erick,

I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir's tip,
and it works great! And I think it's the right way to go. Problems
like "You have to store the data raw for display purposes if you want
the accents to show though" will go away since Analyzer already have
the original text and analyzed token mechanism built-in. And it's
pretty easy to do!

However, is there any special case that you have? Not really knowing
French, I only tested one word, "fenêtre", and it's analyzed into
"fenetre".

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 7/30/07, Erick Erickson <erickerickson@gmail.com> wrote:
> Gosh, I sure hope not, because that would mean that we rolled our
> own for no good reason. We wound up just collapsing
> the input stream by substituting plain old 'e' for all the accented
> variants before indexing and before searching. Be *really* careful
> what character set you're using.
>
> Actually, we would have still had to roll our own because the
> character mapping was...er...wonky <G>....
>
> You have to store the data raw for display purposes if you want the
> accents to show though...
>
> Best
> Erick
>
> On 7/30/07, Chris Lu <chris.lu@gmail.com> wrote:
> >
> > Hi,
> >
> > I am not a French speaker, but here are some questions regarding
> > French analyzer:
> >
> > Is there any analyzer that can do this? Analyze accentuated letters to
> > non accentuated corresponding letters (é,è,ê,ë -> e), so that
> >
> > search "fenêtre" (=window) found all docs with "fenêtre" or "fenetre"
> > and
> > search "fenetre" found the same result, all docs with "fenêtre" or
> > "fenetre"
> >
> > Current analyzers, Snowball-French and FrenchAnalyzer don't have this
> > feature.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message