lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Gallou <adriengal...@gmail.com>
Subject Question about the light and minimal French stemmers
Date Tue, 23 Jul 2019 12:53:57 GMT
Hi,

I'm using both light and minimal French stemmers and encountered an issue
when using the minimal stemmer.

The light stemmer removes the last character of a word if the last two
characters are identical.
We can see that here:
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263
In this light stemmer, there is a check to avoid altering the token if the
token is a number.

The minimal stemmer also removes the last character of a word if the last
two characters are identical.
We can see that here:
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77

But in this minimal stemmer there is no check to see if the character is a
letter or not.
So when we have numeric tokens with the last two characters identical they
are altered.

Is there a reason for this?
Should I file an issue on Jira to add this check?

Thanks,

Adrien Gallou

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message