uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Is there a way to tell UIMA component to only extract some kind of entities when run opennlp.pear?
Date Thu, 15 May 2014 01:43:52 GMT
UIMA's descriptors include a section under the XML capabilities element where
the descriptor may specify inputs and outputs.  These end up informing the
ResultSpecification which is provided to the annotator.  The ResultSpecification
can be queried by the annotator code to see what the annotator ought to produce.

This is used, for example by sample annotators in the examples project:
   TutorialDateTime
   RegExAnnotator
   PersonTitleAnnotator

to control what the annotators produce.

This behavior, on the part of annotators, is "optional" - that is, an annotator
might be written to ignore the ResultSpecification. 

So the key may be to update the annotators to take account of the
ResultSpecification.

For more background, see
http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting

which discusses the ResultSpecification further.

-Marshall
On 5/14/2014 2:21 PM, Jeffery wrote:
> For example, user dynamically specifies what kind of entity user is 
> interested, for example: user may be only interested in person entities, so we 
> run opennlp.pear, but it will extract all entities, such as: 
> person,Organization,Location,Date,Time,Money,Percentage,Parse,Chunk,Token.
>
> This makes the extraction unnecessarily slower. 
>
> Same problem happens for RegExAnnotator.pear, it is able to extract isbn, 
> email etc, we may add our own regex to extract usa phone number or etc.
> But at one time, we may only want to extract email or phone number.
>
>
>


Mime
View raw message