uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Morgan <benm.mor...@gmail.com>
Subject SimpleServer configuration with Sofas
Date Tue, 07 Dec 2010 22:18:44 GMT
Hey folks,

I've got a problem with the UIMA SimpleServer[1][5] not being able to 
correctly run an aggregate analysis engine[6]. The aggregate AE works as 
expected however when I test it with the "UIMA CAS Visual Debugger", the 
"UIMA Run AE" and the "UIMA Document Analyzer".

The analysis engine[2] is relatively simple (as of yet). It is composed 
of the following components:

     AE PDF Text Extractor[3]
         :: gets a URL as the "initial view" and downloads
            the file, extracts the text and puts it in a new
            view by the name of "extractedText".
         -> Input Sofa: urlString
         -> Output Sofa: extractedText
     AE Email Annotator[4]
         :: simple annotator, just annotates email addresses.

When I run the aggregate analysis engine, it terminates before giving 
any results with an error (taken from the Tomcat log file):

     SEVERE: Exception occurred
     org.apache.uima.analysis_engine.AnalysisEngineProcessException:
         Annotator processing failed.
     ...
     Caused by: org.apache.uima.cas.CASRuntimeException:
         No sofaFS with name plainText found.
     ...

"plainText" is the Sofa in the aggregate analysis engine which is linked 
to the output of the PDF Text Extractor "extractedText".

I took the aggregate analysis engine apart piece by piece, and I started 
with the Email Annotator AE. That worked fine with the SimpleServer.

Then I tested the PDF Text Extractor (I changed the input view to 
_InitialView). When I tested a URL, it came through as XML, but only 
with the intial view and not with the extracted text. In fact, when 
testing the text extractor otherwise, it would take around 3 seconds to 
download the pdf file, while the SimpleServer sent back its results 
immediately (so what is that all about? Does it not even run the code in 
the function process()?).

That's my problem, and I wonder if there is something special you need 
to do, when there are views or different output sofas. I can not for the 
life of me figure out, what is wrong and why it does not work.

Thanks for your help,
Ben Morgan

_______________________________________________________________________________

1: 
http://uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html

2: Aggregate AE Descriptor: 
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/referenceAnnotatorDescriptor.xml

3: PDF Extractor descriptor: 
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/PDFTextExtractorDescriptor.xml
    PDF Extractor java source: 
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/src/de/uniwue/informatik/bibrefext/pdf/TextExtractor.java

4: Email Annotator descriptor: 
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/EmailAnnotatorDescriptor.xml

5: SimpleServer web.xml: 
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceWebService/WebContent/WEB-INF/web.xml

6: Complete WAR file: 
https://github.com/downloads/cassava/bibrefext/bibrefext.war

Mime
View raw message