lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Male (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
Date Thu, 31 May 2012 10:56:23 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286474#comment-13286474
] 

Chris Male commented on LUCENE-4019:
------------------------------------

Hi Luca,

Thanks for taking a shot at this.

I wonder whether we can do improve the ParseException message? At the very least it should
include the line that is causing the problem so people can find it.  What would be even better
is if we also included the line number.  The latter is probably not so urgent, but it would
be handy to have for other parsing errors too.

Also I think the changes to the Factory are wrong:

{code}
+      if(strictAffixParsing.equalsIgnoreCase(TRUE)) ignoreCase = true;
+      else if(strictAffixParsing.equalsIgnoreCase(FALSE)) ignoreCase = false;
{code}


                
> Parsing Hunspell affix rules without regexp condition
> -----------------------------------------------------
>
>                 Key: LUCENE-4019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4019
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.6
>            Reporter: Luca Cavanna
>            Assignee: Chris Male
>         Attachments: LUCENE-4019.patch, LUCENE-4019.patch
>
>
> We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules
like the following:
> {code} 
> SFX Na N 1
> SFX Na 0 ste
> {code}
> The rule on the second line doesn't contain the 5th parameter, which should be the condition
(a regexp usually). You can usually see a '.' as condition, meaning always (for every character).
As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat
the missing value as a kind of default value, like '.'.  On the other hand I haven't found
any information about this within the spec. Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message