Return-Path: Delivered-To: apmail-incubator-uima-user-archive@minotaur.apache.org Received: (qmail 89629 invoked from network); 24 Jun 2009 13:37:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Jun 2009 13:37:33 -0000 Received: (qmail 757 invoked by uid 500); 24 Jun 2009 13:37:43 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 694 invoked by uid 500); 24 Jun 2009 13:37:43 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 684 invoked by uid 99); 24 Jun 2009 13:37:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Jun 2009 13:37:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 24 Jun 2009 13:37:34 +0000 Received: (qmail invoked by alias); 24 Jun 2009 13:37:12 -0000 Received: from blueice1n1.de.ibm.com (EHLO [9.152.14.84]) [195.212.29.163] by mail.gmx.net (mp046) with SMTP; 24 Jun 2009 15:37:12 +0200 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX18LMHJchykLQHZACPKmxkIZ//9gJCwPe7b5C9j6ge 3nTr/o8lxG4MEJ Message-ID: <4A422C83.9060306@gmx.de> Date: Wed, 24 Jun 2009 15:39:15 +0200 From: Thilo Goetz User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: question about InterOP between Apache UIMA and Omnifind Annotators (CAS2JDBC) References: <6ec63aa40906231343u66bb0816uc301a52b3c0fcc50@mail.gmail.com> <4A41D049.3020108@gmx.de> <6ec63aa40906240520l57236ccem82d5c4dc4cce3526@mail.gmail.com> In-Reply-To: <6ec63aa40906240520l57236ccem82d5c4dc4cce3526@mail.gmail.com> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.5 X-Virus-Checked: Checked by ClamAV on apache.org Chengmin Ding wrote: > Thanks Thilo! I didn't mean to cross-post to the other list but I didn't > see my question posted in my gmail account so just tried again. Sorry about > it. > > A couple of years ago when we used the IBM UIMA framework, we could run > CAS2JDBC out of Omnifind by including the Omnifind base annotators into the > aggregate analysis engine. (following the Omnifind handbook and suggestions > from Sebastian , c.f. > http://www.ibm.com/developerworks/forums/thread.jspa?threadID=157872&tstart=0&messageID=13941628 > ) > > I guess my question should be better phrased this way: we tried to use the > IBM UIMA Adaptor to wrap up the Omnifind base annotator > (of_tokenization.xml) and does this supposed to work? In our pipeline, we > used the Adaptor twice, one for the Omnifind base annotator(at the > beginning), one of the CAS2JDBC consumer(at the end). > > I appreciate any suggestions/comments on this. Sorry, I told you everything I could dredge up from the depths of my memory. Please try the OF forum on developerworks (not the UIMA forum): http://www.ibm.com/developerworks/forums/forum.jspa?forumID=757 You may have more luck there. --Thilo > > -Chengmin > > On Wed, Jun 24, 2009 at 3:05 AM, Thilo Goetz wrote: > >> Hi Chengmin, >> >> please don't cross post. Answers below. >> >> Chengmin Ding wrote: >>> Hello, >>> >>> We have used the UIMA Adapter for IBM annotators and it worked for some >> of >>> our testing annotators. However, when we tried it on cas2jdbc, we got >> the >>> following error: >>> >>> We have a CPE pipeline and the CAS2JDBC is the only consumer/engine based >> on >>> IBM UIMA framework. We are using Apache UIMA 2.2 for the entire pipeline. >> We >>> were thinking this was caused by missing Omnifind specific annotator >> which >>> fills out the DocumentAnnotation or the omnifind specific >>> com.ibm.es.tt.DocumentMetaData feature structure (which contains >> documentid >>> etc features). We then added the base annotator from Omnifind >>> (OF_Tokenization.xml etc) and also wrapped it up with the adapter. But we >>> still got the same error. Our questions are: >>> >>> 1) Is the error indeed caused by missing some Omnifind specific annotator >>> that fills out the DocumentAnnotation feature structure? >> Not quite sure from the error message, but very likely yes. I suppose >> that cas2jdbc was never intended to be run outside the OF UIMA pipeline. >> OF has an internal document model that is shared between its annotators, >> and I assume that cas2jdbc relies on that model. Seems reasonable, given >> that you will later need to identify documents in the DB based on some ID >> or other. >> >>> 2) Is there any way to further isolate the problem via any tools >> considering >>> we do not have the source code for cas2jdbc? >> I can't think of any. A better place to ask would be the IBM OF >> support forum. >> >>> 3) Can the IBM UIMA Adapter be used the same way to wrap regular >> annotator, >>> aggregated analysis engine and consumers ? >> Yes for primitive and aggregate AEs. Consumers I actually don't know, >> they used to have a special status in IBM UIMA. It doesn't look like >> that's your problem, though. >> >>> 4) Does Apache UIMA have any plan to come up with a CAS2JDBC compatible >> db >>> consumer? >> If there is one, I don't know of it. >> >> --Thilo >> >>> Thanks a lot! >>> ================================================ >>> org.apache.uima.analysis_engine.AnalysisEngineProcessException >>> at >>> >> com.ibm.uima.adapter.ibm.IBMAnalysisEngineWrapper.processAndOutputNewCASes(Unknown >>> Source) >>> at >>> >> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:218) >>> at >>> >> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:892) >>> at >>> >> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577) >>> Caused by: com.ibm.uima.analysis_engine.AnalysisEngineProcessException: >> The >>> common analysis structure cannot be processed. See the previous exception >>> for details. >>> at >>> >> com.ibm.uima.reference_impl.analysis_engine.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:93) >>> at >>> >> com.ibm.uima.reference_impl.analysis_engine.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:392) >>> at >>> >> com.ibm.uima.reference_impl.analysis_engine.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:297) >>> at >>> >> com.ibm.uima.reference_impl.analysis_engine.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:218) >>> ... 4 more >>> Caused by: com.ibm.uima.resource.ResourceProcessException: The common >>> analysis structure cannot be processed. See the previous exception for >>> details. >>> at >>> >> com.ibm.uima.consumer.cas2jdbc.utils.Cas2JdbcLogger.log_PROCESS_CAS__SEVERE(Unknown >>> Source) >>> at com.ibm.uima.consumer.cas2jdbc.Cas2Jdbc.processCas(Unknown Source) >>> at >>> >> com.ibm.uima.reference_impl.analysis_engine.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:89) >>> ... 7 more >>> Caused by: com.ibm.uima.resource.ResourceProcessException: The document's >> ID >>> cannot be parsed. See the previous exception for details. >>> at >>> >> com.ibm.uima.consumer.cas2jdbc.utils.Cas2JdbcLogger.log_BAD_DOCID__SEVERE(Unknown >>> Source) >>> at com.ibm.uima.consumer.cas2jdbc.Cas2Jdbc.parseDocID(Unknown Source) >>> ... 9 more >>> Caused by: java.lang.NullPointerException >>> ... 10 more >>> >>> -Chengmin >>> >> >