uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Kahl" <Andreas_K...@gmx.net>
Subject Re: How to process structured input with UIMA?
Date Wed, 02 Mar 2011 16:31:47 GMT
Jörn, 

thanks for your hint. I will consider that, too. If those Feature Structure variables work
well, they sound to me like a good alternative to 'misusing' the span-bound Annotations. I
really have to practice a bit. 

Best Regards
Andreas  



-------- Original-Nachricht --------
> Datum: Wed, 02 Mar 2011 16:09:25 +0100
> Von: "Jörn Kottmann" <kottmann@gmail.com>
> An: user@uima.apache.org
> Betreff: Re: How to process structured input with UIMA?

> On 3/2/11 3:25 PM, Andreas Kahl wrote:
> > Anuj and Jan,
> >
> > Thank you very much for your tips. I think, I will try the
> annotation-way:
> > Use an CollectionProcessingEngine to iterate all the Docs in my
> input-XML.
> > Instatiate a CAS with the input-XML as text.
> > Then run an Annotator converting all XML-Tags into Annotations (I think
> I am going to set annotation.setBegin() and .setEnd() to something generic
> like 0).
> > Based on that I'm going to build up my Pipeline.
> > I'll keep you posted as soon as I have some results.
> >
> The idea of an annotation is really that it is bound to a span of text. 
> If you do
> not want that, then just use a type which is directly derived from 
> Feature Structure.
> 
> Most text processing assumes that you have annotations which mark a 
> piece of text, then
> retrieve the text, process it and output annotations.
> 
> Lets say you want to use a tokenizer, it needs an annotation (e.g. a 
> sentence) as input and might
> output token annotations within the input annotation span.
> 
> Jörn

Mime
View raw message