uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Updated] (UIMA-2757) TextMarker: Add wildcard rule element
Date Wed, 20 Mar 2013 09:49:15 GMT

     [ https://issues.apache.org/jira/browse/UIMA-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Klügl updated UIMA-2757:
------------------------------

    Issue Type: New Feature  (was: Bug)
    
> TextMarker: Add wildcard rule element
> -------------------------------------
>
>                 Key: UIMA-2757
>                 URL: https://issues.apache.org/jira/browse/UIMA-2757
>             Project: UIMA
>          Issue Type: New Feature
>          Components: TextMarker
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>
> Right now, something like a wildcard or an I-don't-care rule element can be implemented
with ANY*?. However, those rule elements actually investigate each token until the next rule
element is successfully matched, meaning they are slow if there is some space in between.
> A real wildcard, which just skips everything, would really be useful (and faster). This
can be implemented by not iterating over the visible inference annotations, but actually finding
a matchable position in the index and then check whether it is visible. Since the next rule
element can possibly quite complex, it is maybe better to just match to the next annotation,
and if that one is invisible, then return a failed match. This behavior needs actually some
careful testing in different use cases.
> First suggestion for the syntax (** for wild card):
> CW **{-> MARK(Type)} PERIOD; 
> The "**" is maybe not the best solution since it looks quite like a quantifier *?. Introducting
an actual keyword can also be problematic since they might be a type with the same name. Maybe
something like
> CW #{-> MARK(Type)} PERIOD; 
> is better.
> This rule would create an annotation from the end of each capitalized word to the begin
of the next period, including the white spaces. However, those can be removed with the TRIM
action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message