lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Cavanna (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
Date Mon, 07 May 2012 10:57:48 GMT


Luca Cavanna commented on LUCENE-4019:

Thank you Robert for the explanation!
In this specific case it's hard to understand the differences between hunspell and Lucene,
since Lucene doesn't even parse the affix file.
I've been in contact with the authors of those Ducth dictionaries, as well as with the hunspell
author. It turned out that those affix rules are wrong and hunspell actually ignores them.
I think it's better to ignore them in Lucene too, rather than throwing an exception, which
makes impossible to use those dictionaries at all.
> Parsing Hunspell affix rules without regexp condition
> -----------------------------------------------------
>                 Key: LUCENE-4019
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.6
>            Reporter: Luca Cavanna
> We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules
like the following:
> {code} 
> SFX Na N 1
> SFX Na 0 ste
> {code}
> The rule on the second line doesn't contain the 5th parameter, which should be the condition
(a regexp usually). You can usually see a '.' as condition, meaning always (for every character).
As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat
the missing value as a kind of default value, like '.'.  On the other hand I haven't found
any information about this within the spec. Any thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message