uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LASRI YASSINE" <lasri.yass...@gmail.com>
Subject Re: Help on UIMA Analysis Engine Agreggation
Date Wed, 28 Feb 2007 21:21:34 GMT
Thank you for your response, my problem is that :
I have an external file that contains a list of persons names, for example :

adam
smith
lary
page
... etc
and I need to extract all persons names from others source (Text Documents),
for example :
"Lary Page is the creator of google and Adam Smith is an economist"
The annotator shoul extract <Adam Smith> and <Lary Page> as  person name. So
what I can do ?

Bests
- Yassine



2007/2/28, Adam Lally <alally@alum.rpi.edu>:
>
> On 2/28/07, LASRI YASSINE <lasri.yassine@gmail.com> wrote:
> > Hello,
> >
> >  I have create an annotator that extract all String beginning with a
> capital
> > (Accccc)letter and I want to use this annotator (in Aggregation) to
> extract
> > all Sentences containing 2 String all of them begin with capila letter
> > (Xaaaaa Ybbbbb) .
> >
>
> Hi,
>
> You will need to create a second annotator, which will take the
> results of your first annotator and do further processing on them.
> This approach is shown in the MeetingAnnotator example that is
> excercise 4 of the tutorial (see the Annotator & Analysis Engine
> Developer's Guide chapter in the documentation).
>
> Say your first annotator outputs FeatureStructures of the type
> CapitalizedWord.  Your second annotator would get an iterator over
> CapitalizedWords, for example:
>
> jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type
> ).iterator()
>
> Then you iterate over the Capitalized Word annotations and for each
> pair of annotations you can could if they are adjacent in the document
> by seeing if the document text between them is all whitespace.  If you
> find an adjacent pair of CapitalizedWords you can then create a new
> annotation of some other type that spans both CapitalizedWords.
>
> You then create an Aggregate Analysis Engine contains both of your
> annotators.  The way to do this is shown in the tutorial as well.
>
> It wasn't clear to me from your question whether you also need to
> detect sentence boundaries in your document.  If so you can you the
> example SimpleTokenAndSentenceAnnotator that comes with the SDK.
>
> Hope that helps,
>
> -Adam
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message