lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prescott Nasser <geobmx...@hotmail.com>
Subject RE: Using French Analyzer
Date Wed, 11 Mar 2015 15:41:42 GMT
Shubhanshu -

I think you are OK to use that as is. The "long run" is going to be Lucene.Net 4.8.0 which
we are currently working on porting. This will likely require an investment of time to change
your code to update from 3.0.3.

Prescott

Sent from my Windows Phone
________________________________
From: Shubhanshu Pathak<mailto:shubhanshupathak30@gmail.com>
Sent: ‎3/‎5/‎2015 10:40 AM
To: user@lucenenet.apache.org<mailto:user@lucenenet.apache.org>
Subject: Using French Analyzer

Dear Group Members,

I am using Lucene.Net 3.0.3

In one of my projects I have to do language based analysis.

When I was trying to use already in place analyzer for the French language
FrenchAnalyzer, I came to know the fact that internally it uses
FrenchStemFilter.
The documentation of this Filter says that "Don't use me" -

       This stemmer does not implement the Snowball algorithm correctly,

        especially involving case problems.
It is recommended that you consider using the "French" stemmer in the
        snowball package instead.
This stemmer will likely be deprecated in a future release.

This means I should not use this Analyzer.

Then I tried using on SnowballAnalyzer. It provides me a way to do
linguistic
analysis through

Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "French");

Now when I look at the code of the SnowballAnalyzer -

In it's constructor it invokes a method

SetOverridesTokenStreamMethod<SnowballAnalyzer>();

of the base class Analyzer.
This method is already marked as obsolete.

[Obsolete("This is only present to preserve back-compat of classes that
subclass a core analyzer and override tokenStream but not
reusableTokenStream ")]
protected internal virtual void SetOverridesTokenStreamMethod<TClass>()

This means we can not use the SnowballAnalyzer as well for a long run.


So kindly let me know how to achieve the linguistic analysis in such cases
apart from building our own Analyzer.

Thanks & Regards,
Shubhanshu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message