uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: How to process structured input with UIMA?
Date Wed, 02 Mar 2011 15:09:25 GMT
On 3/2/11 3:25 PM, Andreas Kahl wrote:
> Anuj and Jan,
> Thank you very much for your tips. I think, I will try the annotation-way:
> Use an CollectionProcessingEngine to iterate all the Docs in my input-XML.
> Instatiate a CAS with the input-XML as text.
> Then run an Annotator converting all XML-Tags into Annotations (I think I am going to
set annotation.setBegin() and .setEnd() to something generic like 0).
> Based on that I'm going to build up my Pipeline.
> I'll keep you posted as soon as I have some results.
The idea of an annotation is really that it is bound to a span of text. 
If you do
not want that, then just use a type which is directly derived from 
Feature Structure.

Most text processing assumes that you have annotations which mark a 
piece of text, then
retrieve the text, process it and output annotations.

Lets say you want to use a tokenizer, it needs an annotation (e.g. a 
sentence) as input and might
output token annotations within the input annotation span.


View raw message