From user-return-8053-archive-asf-public=cust-asf.ponee.io@uima.apache.org Wed Feb 27 13:11:26 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5E827180608 for ; Wed, 27 Feb 2019 14:11:25 +0100 (CET) Received: (qmail 89022 invoked by uid 500); 27 Feb 2019 13:11:24 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 89010 invoked by uid 99); 27 Feb 2019 13:11:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2019 13:11:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4E68BC24BC for ; Wed, 27 Feb 2019 13:11:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eeRarG1uC5i8 for ; Wed, 27 Feb 2019 13:11:21 +0000 (UTC) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F055C60CE0 for ; Wed, 27 Feb 2019 13:11:20 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id u21so12468476lfu.1 for ; Wed, 27 Feb 2019 05:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=jbqHVTF4vATxRO66Nlehafifd26CmETAJUbQkGeZUVY=; b=XPoSz63NIwYkgkO1lmEPQpwPdFwGatyOvik8u8B64RJhBlMSebp6/Z+IwJqOia2TpP 2YB1cgMFADycHe9hvyknCxUUtd7QKhRG6FzJYixd5hOei3f4cFwVR9Kx0hAOjZ/DbsDD DoyTyGDngtCw+TKUVp1Fc1eV4Ej/EVhGFcrYSWlta/aDysD5RRBaHdDiDEOIcvpKj7Jg gnAapbetLFExsEAum/mo8gQNiPWhmOfFE8ZdK7gWToKZL9j1BrhUTQv/Rrn2JMugWVrd lNH1iq27xZuCqS8NGRVMz+//ZyjIklB4VzJJl/HAYK4gIZCqEbov5NRq1eOAOqxjokYu XM6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=jbqHVTF4vATxRO66Nlehafifd26CmETAJUbQkGeZUVY=; b=rFcuQk8BBKeBwPBMSgS37mZMrZcNua83UFn/3oAEwO8qqAMN/9ut1yPW2LPNVPfxJk VUF3hOAA67+6twaKW0MjeNLHvWR8cCLkkgKHK76OeyzWDcZVO/Qq6a7tljF5SWQDu+rY 5Ca9BW3HKS8yEmZiGMAbZD2PCPs8X9qCcSXPadwNQgaWSR1Hm2BCbhvJsJZo0h3RNNCQ q4q9bCo52gRzlP16yiTb0FMlILczVUBIVUt7O1+HXfRbQYDVcJpGHhFHCbe8tx807Rjo iy0FiH1GTDFfWAhd23ttVWXN/s+YVwUcBJRQhmuxL52DDQxBOb4HhENd68xUMMo9POMH qjxw== X-Gm-Message-State: AHQUAubezk5H/afpk/sH78i8aCFvSVTLDUkXATnxeOMOBP5Y4K/+MpKB MPuCII0o8S6ISOJ3o2PM6vtn5v/1pHhSlRyWbOi2oA== X-Google-Smtp-Source: AHgI3IZLA80l02AhVyy5IGKQJfRfZll2LG87+1R6Fg3JxvFFOjZ2VU/gh4EMiXqSJZIb9b8gT7Yk3hqSORfKNJa9Pak= X-Received: by 2002:a19:f704:: with SMTP id z4mr898549lfe.10.1551273072772; Wed, 27 Feb 2019 05:11:12 -0800 (PST) MIME-Version: 1.0 References: <4401A756-DA9E-4382-B949-80EE1C47E33F@uni-jena.de> <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> <3E82766A-7611-466D-94E5-063A9F42B9BD@uni-jena.de> <6325367E-88C6-4FA0-BFA3-D75BA8C6AFE3@uni-jena.de> In-Reply-To: <6325367E-88C6-4FA0-BFA3-D75BA8C6AFE3@uni-jena.de> From: Bonnie MacKellar Date: Wed, 27 Feb 2019 08:11:00 -0500 Message-ID: Subject: Re: XML files as input to UIMA? To: user@uima.apache.org Content-Type: multipart/alternative; boundary="00000000000073cb250582dfe822" --00000000000073cb250582dfe822 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Thanks. I had actually figured that out yesterday after visiting the project website and reading carefully. The problem, though, is that jcore-ct-reader still does not build. It can't find the clinical trial types even after they have been generated. The mvn package step correctly generates the jar file, but places it in a target folder (don't have the exact path because I am on another computer) which isn't seen by jcore-ct-reader. There is another jar file generated by jcore-types, which ends up in my maven repository, and I *think* that is the one jcore-ct-reader picks up. But it doesn't contain the clinical trial types for some reason. So I just get a lot of error messages when I try to build jcore-ct-reader saying it can't find those classes. The project looks interesting. My parser gets more fields from the trials, and we handle inclusion/exclusion text differently because part of our pipeline is to parse those sentences and annotate them in varius ways. What we had been doing was to use the clinical trial XML parser that I had built to 1) insert rdf triples into a triplestore, 2) generate a text representation of the inclusion/exclusion constraints that is fed through the UIMA pipeline. The output of that process is also placed in the triplestore. One of the things I am looking at doing is unifying this better so that it is all one process. Looking at your reader, I can see how it all could work. Nice! Bonnie MacKellar On Wed, Feb 27, 2019 at 3:29 AM Erik F=C3=A4=C3=9Fler wrote: > Dear Bonnie, > > oh, sorry about that. I tend to forget this: We store all our types in th= e > special jcore-types project. You need to build this project once. You can > just use a maven package command (mvn package) because we build the types > automatically through a Maven plugin. > After that, the types should be available to all project. > > Note that this is not necessary to use the Maven artifacts that we have > already uploaded to Maven central. Those refer to the jcore-types maven > artifact which includes all built types. > > You should only have this issue due to Maven workspace resolution. > > Hope this helps, > > Erik > > > On 26. Feb 2019, at 20:49, Bonnie MacKellar > wrote: > > > > HI, > > > > Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I > can't > > get jcore-ct-reader to build or generate types, although many of the > other > > components do build. I am running an old version of UIMA - 2.81.1. Does > > this require a later version? > > > > thanks > > Bonnie MacKellar > > > > On Mon, Feb 25, 2019 at 8:37 AM Erik F=C3=A4=C3=9Fler > > wrote: > > > >> Dear Bonnie, > >> > >> please check out > >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader < > >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>. > >> > >> Please let me know if you have any questions or if you already decided > to > >> go with one of the other approaches that have been proposed in the > meantime > >> or something entirely different. > >> > >> Best, > >> > >> Erik > >> > >>> On 22. Feb 2019, at 13:17, Bonnie MacKellar > >> wrote: > >>> > >>> Thanks so much! > >>> > >>> Bonnie MacKellar > >>> > >>> On Fri, Feb 22, 2019 at 7:03 AM Erik F=C3=A4=C3=9Fler > > >>> wrote: > >>> > >>>> Hey, > >>>> > >>>> just wanted to say that I didn=E2=80=99t come around to make the com= ponent > >>>> available yet, will do first thing next week! > >>>> > >>>> Best, > >>>> > >>>> Erik > >>>> > >>>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> Yes, we are using that format. I have a parser that I wrote, but it > >> isn't > >>>>> integrated into UIMA. It runs separately and loads the full clinica= l > >>>> trial > >>>>> data into a triplestore (Stardog). I would be interested in your > system > >>>>> since I am not really familiar with how to write file readers in th= e > >> UMIA > >>>>> framework. Perhaps I can merge my parser into it and end up with ju= st > >> the > >>>>> right thing. If you can make it available, I would definitely be > >>>>> interested. I will take a look at the other links as well. Thanks= !! > >>>>> > >>>>> Bonnie MacKellar > >>>>> > >>>>> On Wed, Feb 20, 2019 at 3:54 AM Erik F=C3=A4=C3=9Fler < > erik.faessler@uni-jena.de > >>> > >>>>> wrote: > >>>>> > >>>>>> Dear Bonnie, > >>>>>> > >>>>>> are you talking about the clinical trial XML format used by > >>>>>> ClinicalTrials. gov by any chance? > >>>>>> If so, I did create a UIMA reader for these data. Its not perfect > but > >>>>>> perhaps enough for your purposes and also you might want to enhanc= e > >> it. > >>>>>> Please let me know if you would be interested in that, I did not g= et > >>>>>> around to make it publicly available yet but could do so quickly. > >>>>>> > >>>>>> To answer the general question to the best of my knowledge: > >>>>>> There is no such thing as a general XML reader built-in into the > UIMA > >>>>>> framework. For all non-trivial formats, a specific reader is > >> necessary. > >>>>>> This also holds true with regard to the employed type system. > >>>>>> That being said, there are UIMA readers that try to serve as a > general > >>>> XML > >>>>>> reading facility, e.g. the =E2=80=9CXML Reader=E2=80=9D from our l= ab (JULIELab, > >>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reade= r > < > >>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reade= r > >>> ). > >>>>>> However, in my experience XML inputs come in a lot of different > forms > >>>>>> which might often not be suitable to a generic approach which is > why I > >>>>>> wrote quite a few UIMA readers for specific XML formats in the pas= t. > >>>>>> > >>>>>> Hope that helps, > >>>>>> > >>>>>> Erik > >>>>>> > >>>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar > > >>>>>> wrote: > >>>>>>> > >>>>>>> This is probably a very naive question, but I can't seem to find > >>>> anything > >>>>>>> about this. I currently have a lot of XML files (clinical trial > >>>>>>> descriptions). My current workflow is to run a preprocessor that > >> parses > >>>>>> the > >>>>>>> XML and generates text files in a simple format. I then run these > >> files > >>>>>> in > >>>>>>> a UIMA pipeline, using FileCollectionReader to load the text file= s, > >>>> RUTA > >>>>>> to > >>>>>>> parse the simple format, the Metamap annotator to do some UMLS > >>>>>> annotations, > >>>>>>> and finally I have a writer that generates RDF triples from the > UMIA > >>>>>>> annotations and loads the triples into a database. This has worke= d > >> but > >>>> is > >>>>>>> clunky, especially the preprocessing. I feel like there has to be= a > >>>>>> better > >>>>>>> way. Is there any support for reading XML files or do I need to > >> write > >>>> my > >>>>>>> own CollectionReader? Are there any other tools within UIMA for > >>>> handling > >>>>>>> XML text? > >>>>>>> > >>>>>>> thanks, > >>>>>>> Bonnie MacKellar > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > --00000000000073cb250582dfe822--