uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: How to process structured input with UIMA?
Date Wed, 02 Mar 2011 10:46:06 GMT
On 3/2/11 11:14 AM, Andreas Kahl wrote:
> Mainly I am concerned with the latter:
> Those metadata-records would come in as XML with dozens of fields containing relatively
short texts (most less than 255chars). We need to perform NLP (tokenization, stemming ...)
and some simpler manipulations like reading 3 fields and constructing a 4th from that.
> It would be very desirable to use one Framework for both tasks (in fact we would use
the pipeline to enrich the Metadata-Records with the long texts).
>

You could take the xml, parse it and then construct a short text which 
contains the content togehter
with annoations to mark the existing structure. This new text with the 
annotations will be placed in a new view.
Afterward you can perform your processing within these annotation bounds.

Not sure how you construct the 4th field, but when you can do that 
directly after
the xml parsing it could be part of the constructed text.

With UIMA-AS you should be able to nicely scale the analysis to a few 
machines.

Hope that helps,
Jörn


Mime
View raw message