Hello. I'm very new to the whole world of data mining and have stumbled
upon UIMA within the last week or so. I'm trying to go through all the
documentation and just create a simple application but am hitting some
road blocks and was wondering where I can find some newbie help. I
realize this is sorta of long, so I appreciate any help anyone can give.
First I have a question: What is the difference between a CAS and a JCas
and why would I want to use one over the other? Is this determined by
the AEs I'm using (i.e. if they are implemented by extending a
JCas_*_impl) or is there some other reason? It seems the CAS is more
developed and has things like CasPools, ability to make CASes with
multiple AEs, Consumers, etc. Should I just be using the CAS interface
and forget about JCas?
My main issue right now is that I can't figure out how to set inputs for
an AE. I can't find any examples of how to do it. See the description
below of what I'm trying to do:
I'm trying to use some pre bundled AEs to parse some text. I basically
want to do Named Entity Extraction on text. So I wrote a simple
application that first does Sentence Boundary detection and prints out
the sentences that it finds. That was easy enough. So now I would like
to take those sentences and feed it into the named entity AE. Both the
Sentence Boundary AE and the NE AE I'm using are from the JULIE lab
(http://www.julielab.de). Reading the documentation for the NE AE it
says that is requires inputs as Sentences (the output of the Sentence
Boundary AE). I cannot figure out how to set those inputs and am stuck
at this point. Once I figure that out, I think I'll be getting NEs out
of the CAS.
So now all that being said, I'm also not sure I'm coding this process
the way I'm supposed to. I eventually want to build all this into a
distributed architecture with many threads running constantly processing
using a pool of extractors. I want to be able to submit documents to
the named entity extractor, then persist the named entities in a
database. I would like to have multiple entry points into the extractor
(i.e. adhoc (here is a doc, extract it now)) or using a collection
reader to pull mulitple docs in at once and parse them all. Right now,
my simple application has 2 CASes and 2 AnalsysEngines (one for Sentence
Detection and one for NE Extraction). It seems like I would just want
to make one AE that does the Sentence Detection and passes it on to the
NE extractor, but I don't get how you do this. Do I need to make a new
AE and define these things in the xml that describes it? Or is this a
CPE?
If anyone has a simple NE example application that could point me in the
right direction, that would be great.
Thanks!
Andrew
|