uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Götz <twgo...@gmx.de>
Subject Re: SimpleServer configuration with Sofas
Date Wed, 08 Dec 2010 10:43:39 GMT
Hi Ben,

the SimpleServer is not Sofa-aware, and neither
am I ;-).  I don't think there should be an
exception, though.  Can you please post the full
stack trace, maybe that will help.

To get the PDF extractor working, can't you
change the default view somehow, so that
CAS.getDocumentText() will retrieve the extracted
text?  I thought that was possible to make
non-Sofa aware annotators work with Sofa-aware
ones.  However, not sure.

--Thilo

On 12/7/2010 23:18, Ben Morgan wrote:
> Hey folks,
> 
> I've got a problem with the UIMA SimpleServer[1][5] not being able to correctly
> run an aggregate analysis engine[6]. The aggregate AE works as expected however
> when I test it with the "UIMA CAS Visual Debugger", the "UIMA Run AE" and the
> "UIMA Document Analyzer".
> 
> The analysis engine[2] is relatively simple (as of yet). It is composed of the
> following components:
> 
>     AE PDF Text Extractor[3]
>         :: gets a URL as the "initial view" and downloads
>            the file, extracts the text and puts it in a new
>            view by the name of "extractedText".
>         -> Input Sofa: urlString
>         -> Output Sofa: extractedText
>     AE Email Annotator[4]
>         :: simple annotator, just annotates email addresses.
> 
> When I run the aggregate analysis engine, it terminates before giving any
> results with an error (taken from the Tomcat log file):
> 
>     SEVERE: Exception occurred
>     org.apache.uima.analysis_engine.AnalysisEngineProcessException:
>         Annotator processing failed.
>     ...
>     Caused by: org.apache.uima.cas.CASRuntimeException:
>         No sofaFS with name plainText found.
>     ...
> 
> "plainText" is the Sofa in the aggregate analysis engine which is linked to the
> output of the PDF Text Extractor "extractedText".
> 
> I took the aggregate analysis engine apart piece by piece, and I started with
> the Email Annotator AE. That worked fine with the SimpleServer.
> 
> Then I tested the PDF Text Extractor (I changed the input view to _InitialView).
> When I tested a URL, it came through as XML, but only with the intial view and
> not with the extracted text. In fact, when testing the text extractor otherwise,
> it would take around 3 seconds to download the pdf file, while the SimpleServer
> sent back its results immediately (so what is that all about? Does it not even
> run the code in the function process()?).
> 
> That's my problem, and I wonder if there is something special you need to do,
> when there are views or different output sofas. I can not for the life of me
> figure out, what is wrong and why it does not work.
> 
> Thanks for your help,
> Ben Morgan
> 
> _______________________________________________________________________________
> 
> 1:
> http://uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html
> 
> 
> 2: Aggregate AE Descriptor:
> https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/referenceAnnotatorDescriptor.xml
> 
> 
> 3: PDF Extractor descriptor:
> https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/PDFTextExtractorDescriptor.xml
> 
>    PDF Extractor java source:
> https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/src/de/uniwue/informatik/bibrefext/pdf/TextExtractor.java
> 
> 
> 4: Email Annotator descriptor:
> https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/EmailAnnotatorDescriptor.xml
> 
> 
> 5: SimpleServer web.xml:
> https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceWebService/WebContent/WEB-INF/web.xml
> 
> 
> 6: Complete WAR file: https://github.com/downloads/cassava/bibrefext/bibrefext.war

Mime
View raw message