Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 91721 invoked from network); 4 Jun 2008 15:21:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Jun 2008 15:21:24 -0000 Received: (qmail 29042 invoked by uid 500); 4 Jun 2008 15:21:27 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 29012 invoked by uid 500); 4 Jun 2008 15:21:27 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 29001 invoked by uid 99); 4 Jun 2008 15:21:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2008 08:21:27 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 04 Jun 2008 15:20:35 +0000 Received: (qmail invoked by alias); 04 Jun 2008 15:20:49 -0000 Received: from blueice4n1.de.ibm.com (EHLO [9.152.14.84]) [195.212.29.187] by mail.gmx.net (mp019) with SMTP; 04 Jun 2008 17:20:49 +0200 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX19fvMpzpBFffAM+qU+RLhXH9npyu6Yw32LPInyph6 fd153ojr1zQXJN Message-ID: <4846B2D1.2040605@gmx.de> Date: Wed, 04 Jun 2008 17:20:49 +0200 From: Thilo Goetz User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: Problem using Capabilities - OutputSofa References: <4846825F.90804@neofonie.de> In-Reply-To: X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org And here's some background if you're interested: http://www.mail-archive.com/uima-dev@incubator.apache.org/msg00945.html There's a lot of discussion before that message, and a lot afterwards. So we were mostly agreed that this was broken, but couldn't agree on the proper fix and finally gave up. If we ever do a UIMA 3, we'll have the same discussion all over :-) --Thilo Eddie Epstein wrote: > The CAS reference passed to the annotator process method changes when > Sofa capabilities are declared. See > http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.deciding_multi_view > > After declaring an output Sofa, process gets the "base CAS". To get > the text from the "default" view, try > > String originalText = jcas.getCas().getCurrentView().getDocumentText(); > > Eddie > > PS looks like the JCas interface is missing the getCurrentView() method. > > On 6/4/08, Christoph B�scher wrote: >> Hi, >> >> I ignored the analysis engines "capabilities" section so far, but after I >> tried >> declaring an "outputSofa" for the first time, I ran into trouble using the >> analysis engine in a CPE. >> >> I have an AE that takes webpages in HTML format as input and removes the >> HTML-Tags etc... The result is stored in a new CAS view named >> "plainTextView". >> So far I didn't declare any capabilities in the AEs descriptor, but now I >> tried >> this: >> >> >> >> >> >> >> plainTextView >> >> >> >> >> >> The AEs process() method usually acesses the default view of the JCas, does >> some >> processing and stores the result in the new view. The code goes something >> like this: >> >> // get the text from the default CAS view >> String originalText = jcas.getDocumentText(); >> JCas plainTextView = null; >> >> // Extract plain text from original document >> documentWithoutHTML = someProcessing(); >> >> // create view for stripped HTML document >> try { >> plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME); >> plainTextView.setDocumentText("plainTextView"); >> } catch (CASException e) { >> logger.warn(e.getMessage()); >> throw new AnalysisEngineProcessException(e); >> } >> >> >> Using this AE in a CPE (inside an aggregate AE) was working until I declared >> the >> outputSofa like described above. Now when trying to retrieve the original >> text >> from the default view with "jcas.getDocumentText()" always returns "null". >> Some >> debugging shows that the reason for this is that in >> CASImpl.getSofaDataString() >> it appears this branch is used: >> >> if (this == this.svd.baseCAS) { >> // base CAS has no document >> return null; >> } >> >> >> What am I missing when I declare the output sofa capability of the AE? Why >> does >> the JCas default view seems to be inaccessible after I declared the >> outputSofa? >> >> Thanks for any hints and information! >> >> >> -- >> -------------------------------- >> Christoph B�scher >>