Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C0716200B32 for ; Thu, 23 Jun 2016 15:21:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BEB6F160A59; Thu, 23 Jun 2016 13:21:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9088B160A58 for ; Thu, 23 Jun 2016 15:21:42 +0200 (CEST) Received: (qmail 51309 invoked by uid 500); 23 Jun 2016 13:21:41 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 51297 invoked by uid 99); 23 Jun 2016 13:21:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2016 13:21:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DD739C13EA for ; Thu, 23 Jun 2016 13:21:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id aK1CM5hwe55e for ; Thu, 23 Jun 2016 13:21:38 +0000 (UTC) Received: from mail-vk0-f51.google.com (mail-vk0-f51.google.com [209.85.213.51]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id BD49B5F3FF for ; Thu, 23 Jun 2016 13:21:37 +0000 (UTC) Received: by mail-vk0-f51.google.com with SMTP id u64so105488978vkf.3 for ; Thu, 23 Jun 2016 06:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ZIUeD8wHpiviVAnr07bw1IUbuwRoDl0HssUUqFN19Z4=; b=CPrMTzyh+Y4cyhyBdXUMtB7ON48KmncWU9P6cF1juPn1Ndk59JElWvG9v1kfLPapx8 czj6ZqdQZb/HhOkOnogNZt1ND+oYvFnQaH4ABpHRZFFbYE1MHpK+wtfvRR0RS2RcjbA9 IrgjL3qKBnxIfDMVD7NCNYe+ZJgMLKh6kGwo4bBeZrrqJwObyi2qnzWi5Jy3+cjvJDPA rHxQMCc+Pe+M2qxkQ5ruNl/iNujbimUI/glYYoJ0ReSlkD/zukWHzYMBdX8uWfRC8bNi x3s6yJXTW0glpz0JRcf27wvIsy94lSOsLLTRcHszyKCS6kJGPq/k6gD924LNy7iCpUiX 2Clw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ZIUeD8wHpiviVAnr07bw1IUbuwRoDl0HssUUqFN19Z4=; b=A5iYKWwElg+D3pf8fY3TVWO4JQYD7FyTSCTQH1sDnAMBp6jh5wzI7rlYk2ZA8J04G7 Qmpyo66WBKf6KWVODZpY+Hy8lLbeHpPxCVPLruFCPQVISI5JBnr3VYyRHV8mW8zxUo0r IUPIdaMlELcSs/gpP42TDwtyCdLcQTMnhXsm5p4X4+7pevSAbOR98J+Rz5eC3HvIjTUt KjUduDXq1Ce3kUw3mvfh5d5Sn+YcNfGnkuFNZDS3hFA2IW0y5ni1HFH8oXi+Esf6e4pf zj1df1+lkYU16zmom5UzFbIK0sg9NdMHtn+PnNDPIN7yDLkfLeD7Sy6cW6JdZa5EpZsq ZmEw== X-Gm-Message-State: ALyK8tKDmmeVz7Usijd0i8B57IWSYwxBj7ayTwXklFtCvIY03cbREWknBoeKRG5kKlJ7knwEPAYrofTXtVywlg== X-Received: by 10.176.65.106 with SMTP id j97mr15714450uad.64.1466688090396; Thu, 23 Jun 2016 06:21:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.2.209 with HTTP; Thu, 23 Jun 2016 06:21:29 -0700 (PDT) In-Reply-To: <0a010c1d-fbe6-71f1-061c-ed0e7231d4dc@averbis.com> References: <0a010c1d-fbe6-71f1-061c-ed0e7231d4dc@averbis.com> From: Bonnie MacKellar Date: Thu, 23 Jun 2016 09:21:29 -0400 Message-ID: Subject: Re: problems integrating Ruta and uimaFit To: user@uima.apache.org Content-Type: multipart/alternative; boundary=94eb2c1233149fe2740535f1ee74 archived-at: Thu, 23 Jun 2016 13:21:43 -0000 --94eb2c1233149fe2740535f1ee74 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Thanks. I am not using SourceDocumentInformation in my Ruta script. There is no dependency there - in the version that is in a regular Ruta Workbench project, I can remove it and everything is fine. I believe, from looking at the exception, that the dependency is in UimaFit - it seems to be coming from SimplePipeline.runPipeline. I have tried adding it in UimaFit fashion, listing it in src/main/resources/META-INF/org.apache.uima.fit/types.txt, but I cannot seem to get UimaFit to find this file in the Maven version of this project, even though it works fine in the non-Maven project. I just cannot figure out why this is happening. I also don't understand this "Changing the imports to something like: UIMAFIT org.apache.uima.ruta.engine.PlainTextAnnotator should do the trick (you need also to adapt the TYPESYSTEM import). Then the script does not depend on the project structure." Change which imports? Is this something in the pom file? UIMAFIT brings in additional UimaFit annotation engines to the Ruta script, right? I am not calling or using any UimaFit annotation engines in my Ruta script. I am just trying to bring in PlainTextAnnotator. That isn't a UimaFit annotator - it is something built in to Ruta. I tried changing the lines in the script to ENGINE org.apache.uima.ruta.engine.PlainTextAnnotator; TYPESYSTEM org.apache.uima.ruta.engine.PlainTextTypeSystem; but that doesn't work - I get a "org.apache.uima.ruta.engine.PlainTextAnnotator not found " on the line ENGINE org.apache.uima.ruta.engine.PlainTextAnnotator; I then tried changing to UIMAFIT org.apache.uima.ruta.engine.PlainTextAnnotator; TYPESYSTEM org.apache.uima.ruta.engine.PlainTextTypeSystem; No compile error, but when I run the script, I get Found no script/block: PlainTextAnnotator Exception in thread "main" java.lang.NullPointerException at org.apache.uima.ruta.engine.RutaEngine.batchProcessComplete(RutaEngine.java= :1122) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.batchProc= essComplete(PrimitiveAnalysisEngine_impl.java:321) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.batchProcessCom= plete(AnalysisEngineImplBase.java:447) at org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:133 Clearly it isn't finding PlainTextAnnotator - but that is the crux of my problem. Where do I put it? I think my problem is that I don't understand what these pluigins are all doing or how they affect each other: ruta-maven-plugin, jcasgen-maven-plugin, and uimafit-maven-plugin. They all seem to copy and/or generate different things to target/classes and target/generated-sources, but it is hard to tell exactly which files each one is responsible for. I don't have a good mental model of the process! thanks, Bonnie MacKellar On Thu, Jun 23, 2016 at 5:07 AM, Peter Kl=C3=BCgl wrote: > Hi, > > > sorry, here's just a short reply since I am currently travelling. If > the problem still exists I will try to reproduce it and reply with more > details next week. > > > Yes, in simple UIMA Ruta projects, these descriptors are copied to > descriptor/utils when you create the project. The descriptor folder is > listed in the buildpath as a "descriptor" folder, where imported > descriptors are searched in. > > UIMA Ruta supports currently two ways to find the descriptors: the > absolute paths specified in the descriptorPaths configuration parameter > and the classpath. Thus, the simplest way for you would be to use the > classpath to find the descriptor instead of the descriptorPaths (which > points to the descriptor folder of your ruta project). > > Changing the imports to something like: UIMAFIT > org.apache.uima.ruta.engine.PlainTextAnnotator should do the trick (you > need also to adapt the TYPESYSTEM import). Then the script does not > depend on the project structure. > > > If you use the SourceDocumentInformation type system in your ruta > script, then you need to include it separately. In some situtation, the > Ruta Workbench does that automatically for you. However, it is not > mentioned in types.txt in ruta-core. So you need to add it there in your > maven project so that the typesystem scanning of uimaFIT finds it. > > > If you create the analysis engine (descriptor) for a ruta script > programmatically, there are sometimes additional configuration > parameters that need to be set. In your use case, you import additional > analysis engine in your script. These need to be mentioned in the > corresponding configuration parameters, e.g., PARAM_ADDITIONAL_ENGINES > or PARAM_ADDITIONAL_UIMAFIT_ENGINES. Since there are several parameters > that are rather technical. I normally use the generated descriptor in > the uimaFIT factory. > > > Best, > > > Peter > > > Am 22.06.2016 um 21:55 schrieb Bonnie MacKellar: > > I am still trying to figure out how to count Ruta annotations across a > > bunch of input files. There doesn't seem to be any Workbench way to do > it. > > So now I am trying to call Ruta from UimaFit so I can do the job in Jav= a. > > > > However, I am having serious configuration problems, plus I have a > question > > on how do bring in PlainTextAnnotator. > > > > I am using Maven, with the jcasgen-maven-plugin, the ruta-maven-plugin, > and > > the uimafit-maven-plugin. I will include the pom file at the end of thi= s > > post. > > > > I want my Java code to be aware of the types declared in the Ruta scrip= t > - > > that is the whole point - I want to count those annotations. > > > > My Ruta script also uses PlainTextAnnotator. The problem with this is > that > > I can't figure out where to put it. In a Workbench based Ruta project, > > PlainTextAnnotator.xml and PlainTextAnnotatorTypeSystem get put > > automatically into descriptor/utils, along with a number of other > > descriptors that seem to be built into Ruta. But when I create a projec= t > > using maven, there is no such location, and these descriptors do not ge= t > > put anywhere. I tried a number of places but could not get my script to > see > > the type system for PlainTextAnnotator. Finally, I hit on putting the > files > > in target/generated-sources/ruta/descriptor/utils, and finally my scrip= t > is > > able to see the types and I can run it. This is good because at that > point, > > the ruta-maven-plugin does its job and generates the descriptors for my > > script. However, I suspect this is not a good place to put the > > PlainTextAnnotator files since doing a clean overwrites them. Where > should > > they go? Is there any entry in the pom file that is needed? > > > > The second problem is that although my Ruta script works nicely on its > own, > > the Java code fails. I get the following exception > > Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCa= s > > type "org.apache.uima.examples.SourceDocumentInformation" used in Java > > code, but was not declared in the XML type descriptor. > > at org.apache.uima.jcas.impl.JCasImpl.getTypeInit(JCasImpl.java:435) > > at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:408) > > at org.apache.uima.jcas.cas.TOP.(TOP.java:96) > > at org.apache.uima.jcas.cas.AnnotationBase.(AnnotationBase.java:6= 6) > > at org.apache.uima.jcas.tcas.Annotation.(Annotation.java:54) > > at > > > org.apache.uima.examples.SourceDocumentInformation.(SourceDocumentI= nformation.java:80) > > at > > > org.apache.uima.examples.cpe.FileSystemCollectionReader.getNext(FileSyste= mCollectionReader.java:162) > > at > > > org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.ja= va:149) > > at PipelineSystem.(PipelineSystem.java:59) > > at PipelineSystem.main(PipelineSystem.java:73) > > > > I am guessing that I need to put some other descriptor somewhere but I > > can't figure out what it might be. Here is the code that causes the > problem > > > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------- > > import java.io.IOException; > > import java.util.Iterator; > > > > import org.apache.uima.UIMAException; > > import org.apache.uima.analysis_engine.AnalysisEngine; > > import org.apache.uima.analysis_engine.AnalysisEngineDescription; > > import org.apache.uima.analysis_engine.AnalysisEngineProcessException; > > import org.apache.uima.cas.Type; > > import org.apache.uima.cas.TypeSystem; > > import org.apache.uima.collection.CollectionReaderDescription; > > import org.apache.uima.examples.cpe.FileSystemCollectionReader; > > import org.apache.uima.fit.component.CasDumpWriter; > > import org.apache.uima.fit.factory.AnalysisEngineFactory; > > import org.apache.uima.fit.factory.CollectionReaderFactory; > > import org.apache.uima.fit.pipeline.SimplePipeline; > > import org.apache.uima.jcas.JCas; > > import org.apache.uima.resource.ResourceInitializationException; > > import org.apache.uima.ruta.engine.RutaEngine; > > > > public class PipelineSystem { > > public PipelineSystem() throws IOException, UIMAException > > { > > try { > > CollectionReaderDescription readerDesc =3D > > CollectionReaderFactory.createReaderDescription( > > FileSystemCollectionReader.class, > > FileSystemCollectionReader.PARAM_INPUTDIR, > > "/home/bonnie/Research/eclipse-uima-projects/PipeLineWithRuta/input", > > FileSystemCollectionReader.PARAM_ENCODING, "UTF-8", > > FileSystemCollectionReader.PARAM_LANGUAGE, "English"); > > AnalysisEngine rae =3D AnalysisEngineFactory.createEngine(RutaEngine.cl= ass, > > RutaEngine.PARAM_MAIN_SCRIPT, > > "ecClassifierRules"); > > AnalysisEngineDescription rutaEngineDesc =3D > > AnalysisEngineFactory.createEngineDescription(RutaEngine.class, > > RutaEngine.PARAM_MAIN_SCRIPT, > > "ecClassifierRules"); > > AnalysisEngineDescription writerDesc =3D > > AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class, > > CasDumpWriter.PARAM_OUTPUT_FILE, "dump.txt"); > > JCas jCas =3D rae.newJCas(); > > SimplePipeline.runPipeline(readerDesc, rutaEngineDesc); > > displayRutaResults(jCas); > > } catch (ResourceInitializationException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } catch (AnalysisEngineProcessException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } > > } > > > > public static void main(String[] args) throws IOException, > UIMAException { > > PipelineSystem p =3D new PipelineSystem(); > > > > } > > > > public void displayRutaResults(JCas jCas) > > { > > System.out.println("in display ruta results"); > > TypeSystem ts =3D jCas.getTypeSystem(); > > Iterator typeItr =3D ts.getTypeIterator(); > > while (typeItr.hasNext()) { > > Type type =3D (Type) typeItr.next(); > > if (type.getName().equals("INCL")) { > > System.out.println("INCL was found"); > > } > > } > > } > > > -------------------------------------------------------------------------= ----------------------------------------------------------------------- > > > > Yes, I know the code doesn't actually count annotations yet - this is > > strictly a test of the configuration. The type INCL is declared in the > > script > > > > ENGINE utils.PlainTextAnnotator; TYPESYSTEM utils.PlainTextTypeSystem; > > Document{-> RETAINTYPE(BREAK)}; Document{-> EXEC(PlainTextAnnotator, > > {Line})}; > > > > DECLARE INCL; "INCLUSION" -> INCL; > > > > And finally, here is the pom file. I note that the ruta pugin and the > > jcasegen plugin are correctly generating the descriptor files for the > > script and the Java classes for the types. I have this set up so that t= he > > jcasgen plugin reads the type descriptors from the folder that is > generated > > by the ruta-maven-plugin (I saw this in one of the examples mentioned > > elsewhere on this mailing lsit) > > However, the uimafit plugin does not generate anything. > > > > thanks for any help. It is really hard to figure out all these moving > parts. > > > > Bonnie MacKellar > > > > > -------------------------------------------------------------------------= -------------------------------------------------------- > > > > > http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=3D" > > http://maven.apache.org/POM/4.0.0 > > http://maven.apache.org/xsd/maven-4.0.0.xsd"> > > 4.0.0 PipeLineWithRuta > > PipeLineWithRuta > 0.0.1-SNAPSHOT > > jar PipeLineWithRuta > > http://maven.apache.org > > UTF-8 > > src/main/java > > src/main/ruta > > src/desc > > maven-compiler-plugin > > 3.3 1.8 > > 1.8 > > org.apache.uima > > jcasgen-maven-plugin 2.4.1 > > > generate > > > target/generated-sources/ruta/descriptor/ecClassifierR= ulesTypeSystem.xml > > > > > > > true > > > > org.apache.uima > > ruta-maven-plugin 2.3.1 > > src/main/ruta/ > > > > > ${project.build.directory}/generated-sources/ruta/descrip= tor > > > > ${project.build.directory}/generated-sources/ruta/ > > resources/ > > Engine > > TypeSystem > > false > > org.apache.uima.ruta > > true > > script:src/main/ruta/ > > descriptor:target/generated-sources/ruta/descriptor/ > > resources:src/main/resources/ > > default > > process-classes generate > > > > org.apache.uima > > uimafit-maven-plugin 2.2.0 > > > ${project.build.directory}/generated-sources/uimafit > > > > false > > ${project.build.sourceEncoding} > > default process-classes > > generate > > > > > org.apache.uima uimafit-core > > 2.2.0 > > org.apache.uima uimaj-core > > 2.8.1 > > org.apache.uima > > ruta-maven-plugin 2.3.1 > > org.apache.uima > > uimaj-cpe 2.8.1 > > org.apache.uima > > uimaj-examples 2.8.1 > > > > > > --94eb2c1233149fe2740535f1ee74--