uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: proposal for a new testing and evaluation component
Date Mon, 08 Sep 2008 19:37:32 GMT
I'm starting a vote on this on uima-dev mailing list.

-Marshall

Marshall Schor wrote:
> Michael Tanenblatt wrote:
>> So, is there any actual interest in accepting this into the sandbox?
>> Discussions died down with no resolution.
>>
>> ...m
> Yes, please submit a Jira issue with an attachmentment and a checksum
> for it.  Then we'll call an official vote on the uima-dev list.
> -Marshall
>>
>>
>> On May 15, 2008, at 12:24 PM, Igor Sominsky wrote:
>>
>>> My group would like to offer the following UIMA component, Common
>>> Feature Extractor (CFE), as an open source offering into the UIMA
>>> sandbox, assuming there is interest from the community:
>>>
>>>
>>>
>>> CFE enables the configuration driven feature value extraction from
>>> UIMA annotations contained in CAS. The extracted information can be
>>> used for statistical analysis, performance metrics evaluation,
>>> regression testing and machine learning related processing.
>>>
>>>
>>>
>>> CFE provides a flexible, yet powerful language FESL (Feature
>>> Extraction Specification Language) for working with the UIMA CAS to
>>> enable the collection and classification of resultant data. FESL is
>>> a declarative XML-based language that expresses semantic rules for
>>> the feature extraction. While the rules guide the feature extraction
>>> in a completely generalized way and CFE provides methods for
>>> subsequent processing to format the output of the extraction as
>>> needed for downstream use.  The destination for the output is
>>> defined by a particular application where CFE is used (CAS, external
>>> file, database, etc.). CFE could be implemented by either TAE or CAS
>>> Consumer, depending on a particular application needs
>>>
>>>
>>>
>>> FESL rules allow flexible and powerful way of defining
>>> multi-parameter criteria for specific information to be extracted
>>> from CAS. Such criteria can be customized by:
>>>
>>>  1.. a type of an UIMA annotation object that contains the feature
>>> of interest
>>>  2.. a surrounding (enclosing) annotation type and a relative
>>> location of the object within the enclosure that limits the
>>> extraction within a boundaries of a certain UIMA type.
>>>  3.. "path" to the feature from the annotation object
>>>  4.. a type and value of the feature itself
>>>  5.. values of any public Java get-style methods (methods that
>>> accept no parameters and return a value) implemented by the
>>> underlying class of the feature
>>>  6.. a location of the object or the feature on a specific path (in
>>> cases when it is required to select/bypass annotations if they are
>>> features of other UIMA annotation types)
>>>
>>>
>>> The feature values can be evaluated by conditional expressions
>>> stated in FESL. Particularly, the feature values can be evaluated
>>> whether they:
>>>
>>>  1.. are of a certain type
>>>  2.. belong to a specific set of values (vocabulary)
>>>  3.. belong to a range of numeric values (inclusively or
>>> non-inclusively)
>>>  4.. match certain bits of a bit mask (integer values only)
>>>  5.. match a Java regular expression pattern,
>>>
>>>
>>> These expressions can be specified in disjunctive normal form that
>>> gives a powerful and flexible way of defining fairly complex
>>> criteria for an extraction of a required annotation and/or its value
>>>
>>>
>>>
>>> The FESL itself is defined in XSD format and integrated with EMF for
>>> syntax validation and automated code generation.
>>>
>>>
>>>
>>> CFE has been successfully used in several internal projects for
>>> evaluation of performance metrics and machine learning.
>>>
>>>
>>>
>>> CFE is described in more detail in the paper "CFE - a system for
>>> testing, evaluation and machine learning of UIMA based
>>> applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be
>>> presented at UIMA for NLP workshop as part of the LREC 2008
>>> conference in Marrakech, Morocco.
>>>
>>>
>>>
>>> Igor Sominsky
>>>
>>> sominsky@gmail.com
>>
>>
>>
>
>
>

Mime
View raw message