uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Problem using Capabilities - OutputSofa
Date Wed, 04 Jun 2008 15:20:49 GMT
And here's some background if you're interested:
http://www.mail-archive.com/uima-dev@incubator.apache.org/msg00945.html

There's a lot of discussion before that message,
and a lot afterwards.

So we were mostly agreed that this was broken, but
couldn't agree on the proper fix and finally gave
up.  If we ever do a UIMA 3, we'll have the same
discussion all over :-)

--Thilo

Eddie Epstein wrote:
> The CAS reference passed to the annotator process method changes when
> Sofa capabilities are declared. See
> http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.deciding_multi_view
> 
> After declaring an output Sofa, process gets the "base CAS". To get
> the text from the "default" view, try
> 
> String originalText = jcas.getCas().getCurrentView().getDocumentText();
> 
> Eddie
> 
> PS looks like the JCas interface is missing the getCurrentView() method.
> 
> On 6/4/08, Christoph Büscher <christoph.buescher@neofonie.de> wrote:
>> Hi,
>>
>> I ignored the analysis engines "capabilities" section so far, but after I
>> tried
>> declaring an "outputSofa" for the first time, I ran into trouble using the
>> analysis engine in a CPE.
>>
>> I have an AE that takes webpages in HTML format as input and removes the
>> HTML-Tags etc... The result is stored in a new CAS view named
>> "plainTextView".
>> So far I didn't declare any capabilities in the AEs descriptor, but now I
>> tried
>> this:
>>
>> <capabilities>
>>        <capability>
>>          <inputs/>
>>          <outputs/>
>>          <outputSofas>
>>            <sofaName>plainTextView</sofaName>
>>          </outputSofas>
>>          <languagesSupported/>
>>        </capability>
>> </capabilities>
>>
>> The AEs process() method usually acesses the default view of the JCas, does
>> some
>> processing and stores the result in the new view. The code goes something
>> like this:
>>
>>   // get the text from the default CAS view
>>   String originalText = jcas.getDocumentText();
>>   JCas plainTextView = null;
>>
>> // Extract plain text from original document
>> documentWithoutHTML = someProcessing();
>>
>> // create view for stripped HTML document
>> try {
>>     plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME);
>>     plainTextView.setDocumentText("plainTextView");
>> } catch (CASException e) {
>>     logger.warn(e.getMessage());
>>     throw new AnalysisEngineProcessException(e);
>> }
>>
>>
>> Using this AE in a CPE (inside an aggregate AE) was working until I declared
>> the
>> outputSofa like described above. Now when trying to retrieve the original
>> text
>> from the default view with "jcas.getDocumentText()" always returns "null".
>> Some
>> debugging shows that the reason for this is that in
>> CASImpl.getSofaDataString()
>> it appears this branch is used:
>>
>> if (this == this.svd.baseCAS) {
>>        // base CAS has no document
>>        return null;
>> }
>>
>>
>> What am I missing when I declare the output sofa capability of the AE? Why
>> does
>> the JCas default view seems to be inaccessible after I declared the
>> outputSofa?
>>
>> Thanks for any hints and information!
>>
>>
>> --
>> --------------------------------
>> Christoph Büscher
>>

Mime
View raw message