lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tanguy Moal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3463) FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in long tokens
Date Wed, 16 May 2012 16:09:04 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tanguy Moal updated SOLR-3463:
------------------------------

    Attachment: SOLR-3463.patch

Updated patch to cover corner case (code also performs additional deletion of last character
if it equals last character minus 1.

Also added very minimal unit test (which exhibited the uncovered corner case)
                
> FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in
long tokens
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3463
>                 URL: https://issues.apache.org/jira/browse/SOLR-3463
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.4
>            Reporter: Tanguy Moal
>            Priority: Minor
>         Attachments: SOLR-3463.patch, SOLR-3463.patch
>
>
> FrenchLightStemmer performs aggressive deletions on repeated character sequences, even
on numbers.
> This might be unexpected during full text search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message