uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LASRI YASSINE" <lasri.yass...@gmail.com>
Subject Re: Help on UIMA Analysis Engine Agreggation
Date Wed, 28 Feb 2007 21:21:34 GMT
Thank you for your response, my problem is that :
I have an external file that contains a list of persons names, for example :

... etc
and I need to extract all persons names from others source (Text Documents),
for example :
"Lary Page is the creator of google and Adam Smith is an economist"
The annotator shoul extract <Adam Smith> and <Lary Page> as  person name. So
what I can do ?

- Yassine

2007/2/28, Adam Lally <alally@alum.rpi.edu>:
> On 2/28/07, LASRI YASSINE <lasri.yassine@gmail.com> wrote:
> > Hello,
> >
> >  I have create an annotator that extract all String beginning with a
> capital
> > (Accccc)letter and I want to use this annotator (in Aggregation) to
> extract
> > all Sentences containing 2 String all of them begin with capila letter
> > (Xaaaaa Ybbbbb) .
> >
> Hi,
> You will need to create a second annotator, which will take the
> results of your first annotator and do further processing on them.
> This approach is shown in the MeetingAnnotator example that is
> excercise 4 of the tutorial (see the Annotator & Analysis Engine
> Developer's Guide chapter in the documentation).
> Say your first annotator outputs FeatureStructures of the type
> CapitalizedWord.  Your second annotator would get an iterator over
> CapitalizedWords, for example:
> jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type
> ).iterator()
> Then you iterate over the Capitalized Word annotations and for each
> pair of annotations you can could if they are adjacent in the document
> by seeing if the document text between them is all whitespace.  If you
> find an adjacent pair of CapitalizedWords you can then create a new
> annotation of some other type that spans both CapitalizedWords.
> You then create an Aggregate Analysis Engine contains both of your
> annotators.  The way to do this is shown in the tutorial as well.
> It wasn't clear to me from your question whether you also need to
> detect sentence boundaries in your document.  If so you can you the
> example SimpleTokenAndSentenceAnnotator that comes with the SDK.
> Hope that helps,
> -Adam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message