lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: a question for french analyzer
Date Tue, 31 Jul 2007 00:13:27 GMT
<<<However, is there any special case that you have?>>>

Yes, the character set we use is, as I remember,
MARC-8. Which I don't think is the ISOLatin....,
but since I didn't know about that filter when we had our problem,
I didn't even look. Oh well, smarter/braver/lazier next time <G>...

Which is why I love this list, I find things like this and look
smarter next time something similar comes up <G>.

Thanks
Erick

On 7/30/07, Chris Lu <chris.lu@gmail.com> wrote:
>
> Hi, Erick,
>
> I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir's tip,
> and it works great! And I think it's the right way to go. Problems
> like "You have to store the data raw for display purposes if you want
> the accents to show though" will go away since Analyzer already have
> the original text and analyzed token mechanism built-in. And it's
> pretty easy to do!
>
> However, is there any special case that you have? Not really knowing
> French, I only tested one word, "fenêtre", and it's analyzed into
> "fenetre".
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>
>
> On 7/30/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > Gosh, I sure hope not, because that would mean that we rolled our
> > own for no good reason. We wound up just collapsing
> > the input stream by substituting plain old 'e' for all the accented
> > variants before indexing and before searching. Be *really* careful
> > what character set you're using.
> >
> > Actually, we would have still had to roll our own because the
> > character mapping was...er...wonky <G>....
> >
> > You have to store the data raw for display purposes if you want the
> > accents to show though...
> >
> > Best
> > Erick
> >
> > On 7/30/07, Chris Lu <chris.lu@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I am not a French speaker, but here are some questions regarding
> > > French analyzer:
> > >
> > > Is there any analyzer that can do this? Analyze accentuated letters to
> > > non accentuated corresponding letters (é,è,ê,ë -> e), so that
> > >
> > > search "fenêtre" (=window) found all docs with "fenêtre" or "fenetre"
> > > and
> > > search "fenetre" found the same result, all docs with "fenêtre" or
> > > "fenetre"
> > >
> > > Current analyzers, Snowball-French and FrenchAnalyzer don't have this
> > > feature.
> > >
> > > --
> > > Chris Lu
> > > -------------------------
> > > Instant Scalable Full-Text Search On Any Database/Application
> > > site: http://www.dbsight.net
> > > demo: http://search.dbsight.com
> > > Lucene Database Search in 3 minutes:
> > >
> > >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message