lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christoph Kaser (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong stemming
Date Fri, 26 Jun 2015 11:22:04 GMT


Christoph Kaser commented on LUCENE-6586:

Hi Michael,

I tried to write a small test case and realized that there is no input that leads to a wrong
substCount is only used to decide how large the original input was, because some suffixes
are only stripped if the token has a minimum length.

if ( ( buffer.length() + substCount > 5 ) &&
      buffer.substring( buffer.length() - 2, buffer.length() ).equals( "nd" ) )
      buffer.delete( buffer.length() - 2, buffer.length() );

However, every substitution leaves at least one character. For the bug to take effect, there
has to be a substitution before the one that sets substCount to 2 (instead of incrementing
it by 2).
So we have
- 2 characters that where left by the (at least 2) substitutions
- the suffix  "nd" 
- substCount, which was set to 2
That sums up to 6 , which is greater than 5

The other conditions that check on substCount work the same, except they check for greater
than 4.

Therefore, there is no token that triggers any wrong behaviour.

Still, I think the typo should be fixed, because it might be copied to a place where it has
an effect.

> There is a typo in GermanStemmer that can lead to wrong stemming
> ----------------------------------------------------------------
>                 Key: LUCENE-6586
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 5.2.1
>            Reporter: Christoph Kaser
>            Priority: Minor
> There is a small typo in GermanStemmer that leads to a wrong calclulation of the substCount
in line 203:
> {code}substCount =+ 2;{code}
> should be
> {code}substCount += 2;{code}
> I created a Pull Request for this some time ago, but it was apprently overlooked:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message