lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Male (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
Date Mon, 28 May 2012 13:04:23 GMT


Chris Male commented on LUCENE-4019:

Hi Luca,

Sorry for taking so long to get to this.  Patch looks good and seems to fix the problem. 
I think we do need some way to force 'strict' parsing of the files.  Do you think you can
add a option for that? When strict parsing is enabled, lines without the expected number of
elements cause an error.  

We can even have this enabled by default so users have to explicitly say that they know the
dictionary doesn't conform to our standard and are okay with us silently ignoring bad rules.
> Parsing Hunspell affix rules without regexp condition
> -----------------------------------------------------
>                 Key: LUCENE-4019
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.6
>            Reporter: Luca Cavanna
>            Assignee: Chris Male
>         Attachments: LUCENE-4019.patch
> We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules
like the following:
> {code} 
> SFX Na N 1
> SFX Na 0 ste
> {code}
> The rule on the second line doesn't contain the 5th parameter, which should be the condition
(a regexp usually). You can usually see a '.' as condition, meaning always (for every character).
As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat
the missing value as a kind of default value, like '.'.  On the other hand I haven't found
any information about this within the spec. Any thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message