Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 852EC200B41 for ; Thu, 7 Jul 2016 18:58:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 83B3E160A68; Thu, 7 Jul 2016 16:58:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A726E160A4F for ; Thu, 7 Jul 2016 18:58:24 +0200 (CEST) Received: (qmail 87038 invoked by uid 500); 7 Jul 2016 16:58:23 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 87026 invoked by uid 99); 7 Jul 2016 16:58:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2016 16:58:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CF2221A0A35 for ; Thu, 7 Jul 2016 16:58:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 81UauEVhP9gs for ; Thu, 7 Jul 2016 16:58:20 +0000 (UTC) Received: from mail-vk0-f45.google.com (mail-vk0-f45.google.com [209.85.213.45]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id BD2965FB36 for ; Thu, 7 Jul 2016 16:58:19 +0000 (UTC) Received: by mail-vk0-f45.google.com with SMTP id b192so28204739vke.0 for ; Thu, 07 Jul 2016 09:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=7hGKjXOeNskK3Mvat022NGX1LQksSjHpcqpkxTeOq4A=; b=ewMZ44719+ndPr6oDbS7EshTBRMJKYPoTCSx9p1r1Vgupha+WGrm3KsN90KBM2OYLN 8HfUtZhvI2pb8IhrfF6WiPNs/IUs6PFvT86EFWykXNcUWm5wc+TwF2o8FXNsoThC6K2x eZlPhfsu5r9X4SUB5wYZqVSTKjYEzM6xZoJwbSyrgcpv7i5VkXHQZdbJsLBlwW9KcRIv nl+7tvUDmEQKInskWXIoww0qNr89lkn5JQoHCZMifr162OJ2w/MzsKpX9pvYX7q1lwXz CigF4ZjJq1EySO44WMvwvbl2/rFAKUt/D8cyyCfwFIRpj7Q8N69BPT08mpDVefdwlKRq Zb3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=7hGKjXOeNskK3Mvat022NGX1LQksSjHpcqpkxTeOq4A=; b=ciNZSFO0AlmPo1lZbgSEKPttETShj1vgaN0z5Y8GrexFWNsSRvtQZqzoZBIxlXaJKe EA6PYvnAVZUCiEaF5M68OxKDWhcFHAFWhXuuM4320cB7HoCr1azFgULG61bIDKOlpX6W Rh7IKAQ0NvgskHgLy161DjyEgPEzqtqFbsC/0zsPZTPssflt2jJssC16aq2bGCs2EL8G DgHw3sTR3pU3MUknazsr4qdrepNye58UdAaknQN0/5nlDdpo92ZDbDgY76OxzmEp4EKS 2hXmGJ4hObuke8sP3ka8ps7eiPWlUAzDMUVaNZXSekDJ7cS8JliJOfxsV9H/G3bMWEYp 9rUg== X-Gm-Message-State: ALyK8tIudE550XkrmC+c4Clg9gbXpyJt2uX3f61CLbIcHxl2R1CvhWE28+QU8oCEBG+191Dw/A8cjuqt2elXVg== X-Received: by 10.31.238.207 with SMTP id m198mr578940vkh.64.1467910698514; Thu, 07 Jul 2016 09:58:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.40.167 with HTTP; Thu, 7 Jul 2016 09:58:17 -0700 (PDT) In-Reply-To: <233eda81-b77b-7394-7727-d04b264c51d6@averbis.com> References: <77fa0e51-7f4f-192b-4d3d-d2501c626f62@averbis.com> <3166AE61-E49E-4435-BCFF-C42040BEFB6F@apache.org> <233eda81-b77b-7394-7727-d04b264c51d6@averbis.com> From: Bonnie MacKellar Date: Thu, 7 Jul 2016 12:58:17 -0400 Message-ID: Subject: Re: missing Ruta annotations from uimaFit To: user@uima.apache.org Content-Type: multipart/alternative; boundary=94eb2c149e94bf41a505370e9779 archived-at: Thu, 07 Jul 2016 16:58:25 -0000 --94eb2c149e94bf41a505370e9779 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I fixed the problem by taking the same code and putting it in a class that extends CasConsumer_ImplBase. This method is called from the class's process method private void processFeatureStructures(CAS aCAS) { Iterator annotationIterator =3D aCAS.getAnnotationIndex().iterator(); while (annotationIterator.hasNext()) { AnnotationFS annotation =3D annotationIterator.next(); try { out.println("[" + annotation.getCoveredText() + "]"); Type type =3D annotation.getType(); String typeName =3D type.getName(); out.println("found annotation type " + typeName ); Integer count =3D typeCounts.get(typeName); out.println("current count is " + count); etc. I then use this code AnalysisEngineDescription myWriterDesc =3D AnalysisEngineFactory.createEngineDescription(CountWriter.class, CountWriter.PARAM_OUTPUT_FILE, "myOutput.txt"); SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, myWriterDesc); to run it. And it works perfectly. All my annotations show up. Which is great, because I now have my totals that I needed. But it isn't great, because I cannot explain this. That worries me because it is hard to develop complex systems when you don't understand the underlying model. Also, this method does not work either, and I believe that it should AnalysisEngine rae =3D AnalysisEngineFactory.createEngine(rutaEngineDesc); JCas jCas =3D rae.newJCas(); jCas.setDocumentText(someText); rae.process(jCas); displayRutaResults(jCas); In this case, the jCas contents were incorrect in the same way that they were when I used for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc, rutaEngineDesc)) { displayRutaResults(jcas); Only if I put the code in a class that is used in runPipeline do I get the correct results. If you have any ideas about this, I would love to know. In any case, I almost have all the results I need. I have one more task - I need to figure out how to obtain text that was annotated with one particular annotation and NO OTHER. I am thinking that selectCovered might do what I need, but I can't find many examples of how to use it. thanks for your help, Bonnie MacKellar On Thu, Jul 7, 2016 at 4:48 AM, Peter Kl=C3=BCgl wrote: > Nope, that should not be a problem since the types are initialized in > process() > > > Am 07.07.2016 um 10:26 schrieb Richard Eckart de Castilho: > > Hi, > > > > iteratePipeline and runPipeline should be mostly equivalent. > > A difference occurs if you e.g. have a CAS multiplier within > > an aggregate engine. > > > > runPipeline delegates the execution to the UIMA core and is able > > to handle CAS multipliers. > > > > iteratePipeline (re)uses a single CAS instance which is passed > > to the reader and all analysis engines in turn. It does not > > support CAS multipliers. > > > > A user recently pointed out that uimaFIT 2.2.0 reintroduces a bug > > in iteratePipeline - typeSystemInit() is not called [1]. > > > > @Peter: could the missing call to typeSystemInit() be a problem for Rut= a? > > > > Cheers, > > > > -- Richard > > > > [1] https://issues.apache.org/jira/browse/UIMA-4998 > > > >> On 07.07.2016, at 09:17, Peter Kl=C3=BCgl w= rote: > >> > >> Hi, > >> > >> > >> I have no idea yet why the code with iteratePipeline does not work. > >> > >> > >> Richard, do you have an idea? > >> > >> > >> Are there any exceptions? Do you use the rae objects somewhere? Is you= r > >> code hosted somewhere, e.g., on github? What do you mean by your own > >> annotations? Annotations of an external type system or annotations add= ed > >> by another engine or reader? > >> > >> > >> Best, > >> > >> > >> Peter > >> > >> > >> Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar: > >>> I have a very lengthy Ruta script which annotates my files > successfully. I > >>> can see all the annotations in AnnotationBrowser and they are correct= . > >>> I want to get all the annotations in a Java program, so I can count > >>> occurrences. I am using uimaFit. I am getting very odd results. > >>> > >>> When I use CasDumpWriter, I see all my annotations, correctly written > to > >>> the dump file. Here is the code that does this > >>> > -------------------------------------------------------------------------= ------------------------------ > >>> AnalysisEngineDescription rutaEngineDesc =3D > >>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class, > >>> RutaEngine.PARAM_MAIN_SCRIPT, > >>> "ecClassifier", > >>> RutaEngine.PARAM_SCRIPT_PATHS, new String[] > >>> > {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"}, > >>> RutaEngine.PARAM_DESCRIPTOR_PATHS, new String[] > >>> > {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated= -sources/ruta/descriptor"}, > >>> RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES, > >>> "org.apache.uima.ruta.engine.PlainTextAnnotator"); > >>> AnalysisEngineDescription writerDesc =3D > >>> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class, > >>> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt"); > >>> AnalysisEngine rae =3D > AnalysisEngineFactory.createEngine(rutaEngineDesc); > >>> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc); > >>> > -------------------------------------------------------------------------= ---------------------------- > >>> > >>> However, when I try to do this myself, using iteratePipeline to itera= te > >>> through the JCas structures for each input file, many of the > annotations > >>> are missing. I have a suspicion that the missing annotations are ones > that > >>> annotate text for which there is another annotation. For example, > text > >>> will be annotated with Line, and with my own annotation. My code to > print > >>> the annotations is based on the code in CasDumpWriter. > >>> > >>> > -------------------------------------------------------------------------= ---------------------------- > >>> > >>> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc, > >>> rutaEngineDesc)) { > >>> displayRutaResults(jcas); > >>> > >>> > >>> public void displayRutaResults(JCas jcas) > >>> { > >>> System.out.println("in display ruta results"); > >>> > >>> FSIterator annotationIter =3D > >>> jcas.getAnnotationIndex().iterator(); > >>> while (annotationIter.hasNext()) > >>> { > >>> AnnotationFS annotation =3D annotationIter.next(); > >>> System.out.println(annotation.getType().getName()); > >>> System.out.println(annotation.getCoveredText()); > >>> > >>> System.out.println("------------------------------------------"); > >>> // System.out.println(annotation.toString()); > >>> } > >>> } > >>> > >>> > -------------------------------------------------------------------------= ----------------------- > >>> > >>> Why would this code produce different results than CasDumpWriter, whi= ch > >>> uses almost exactly the same code? Is it something to do with using > >>> runPipeline vs iteratePipeline? Should I write my code so it can be > placed > >>> inside runPipeline? > >>> > >>> thanks so much! > >>> Bonnie MacKellar > >>> > > --94eb2c149e94bf41a505370e9779--