uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Help on UIMA Analysis Engine Agreggation
Date Wed, 28 Feb 2007 15:50:42 GMT
On 2/28/07, LASRI YASSINE <lasri.yassine@gmail.com> wrote:
> Hello,
>
>  I have create an annotator that extract all String beginning with a capital
> (Accccc)letter and I want to use this annotator (in Aggregation) to extract
> all Sentences containing 2 String all of them begin with capila letter
> (Xaaaaa Ybbbbb) .
>

Hi,

You will need to create a second annotator, which will take the
results of your first annotator and do further processing on them.
This approach is shown in the MeetingAnnotator example that is
excercise 4 of the tutorial (see the Annotator & Analysis Engine
Developer's Guide chapter in the documentation).

Say your first annotator outputs FeatureStructures of the type
CapitalizedWord.  Your second annotator would get an iterator over
CapitalizedWords, for example:

jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type).iterator()

Then you iterate over the Capitalized Word annotations and for each
pair of annotations you can could if they are adjacent in the document
by seeing if the document text between them is all whitespace.  If you
find an adjacent pair of CapitalizedWords you can then create a new
annotation of some other type that spans both CapitalizedWords.

You then create an Aggregate Analysis Engine contains both of your
annotators.  The way to do this is shown in the tutorial as well.

It wasn't clear to me from your question whether you also need to
detect sentence boundaries in your document.  If so you can you the
example SimpleTokenAndSentenceAnnotator that comes with the SDK.

Hope that helps,

-Adam

Mime
View raw message