lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-5818) Fix hunspell zero-string overgeneration
Date Fri, 11 Jul 2014 15:47:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-5818:
--------------------------------

    Attachment: LUCENE-5818.patch

Simple patch with some tests. This might be a bug i introduced when cutting over to FST, because
we had no test for it before.

> Fix hunspell zero-string overgeneration
> ---------------------------------------
>
>                 Key: LUCENE-5818
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5818
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5818.patch
>
>
> Currently, its allowed to strip suffixes/prefixes all the way down to the empty string.
But this is not really allowed, and creates overgeneration in some cases (especially where
endings can be standalone ... typically these are stopwords so it causes a lot of damage).
> Example is czech 'už' which should just stem to itself, but today also stems to 'úžit'
because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message