From user-return-8051-apmail-uima-user-archive=uima.apache.org@uima.apache.org Tue Feb 26 19:49:17 2019 Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44B3C1868F for ; Tue, 26 Feb 2019 19:49:17 +0000 (UTC) Received: (qmail 38654 invoked by uid 500); 26 Feb 2019 19:49:17 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 38582 invoked by uid 500); 26 Feb 2019 19:49:17 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 38555 invoked by uid 99); 26 Feb 2019 19:49:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2019 19:49:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D177B182226 for ; Tue, 26 Feb 2019 19:49:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id V8Q_CASZeKYQ for ; Tue, 26 Feb 2019 19:49:13 +0000 (UTC) Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 71DD85F175 for ; Tue, 26 Feb 2019 19:49:13 +0000 (UTC) Received: by mail-lj1-f174.google.com with SMTP id t13so7507476lji.2 for ; Tue, 26 Feb 2019 11:49:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=BmmblaTtV4SMa8uVJCXs0r28kd7STX/lHIRZZWXFsyA=; b=ThHT9oMJSBbThECJPfsoswumtebrvirbKwyGfW97j1Mav0n/epVQWF/XeYfNfDEGZ1 6d1O133s5t9jIdpoAtCP1eC1LFSxMgdjl/IsG96LIrabsaEz+aCPYHtYKWyJ+zwZw/T8 V7SiABNJH4kLGtkRdrDSLCJDcYau/iX16lWAZ8y8c0Rfa0fkNLKZK+h/KEFD4RPBu7VT fFcN/OZ9yItyYn7zWaw2Y+omf8LJWNwU8fcZGF2clGPN54luca+ikiy4e8qHKoQIKydH ziWR5wAEqjJZMQjG56S7rh8eUL2EitTAAuV2hV+fRg2QuKFeCnEUyYkgEP5J54nCiupj ixPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=BmmblaTtV4SMa8uVJCXs0r28kd7STX/lHIRZZWXFsyA=; b=loJyLInuy2LJrAUm7Pb9IHTLZObDxpigC6fulVvE+sg/uDOLmISlCu41IUA6eyEMEU ZtaRr8cDmIXkgWEV1xAHipp3Qhkym0iyYu3l1YK3oS5anRYcH+Y+Mx/nAEIp++IkXZVo XH/6w15D9QRvwwCy9Q5C3cAUHIhnQVHhYqaMrtq6Zmt+Vo3wm2JmjHiDXBWVadKzugkX kkbl85iDFlFffasQ63jWd8VOF95vGWsXKln8l4Vbfj9fU6zXb3Qd66BLiusUgGEOAjrM aA22s+Wd7FjDhvPSb6ac5wWlFCARmNNOkMA4oiNCPxVN9gjBSmVcRX02ssXE6iKBullA BUOQ== X-Gm-Message-State: AHQUAuasCz4iLlZoajnGFxK18o/C/G5TsuGGUoFKnx4AhKOAdxVkNi+M 1xM89OkqDB7mFtP5hjKiZ5Oop4gBcw0ZV3ihTI2OJw== X-Google-Smtp-Source: AHgI3IYAnywUHF/QgGLwFSpN+ivexvrbkxCTGMIj4XIWebk+IgtPW28Yer/OatXeX2OcVRri9Ivn7S+RuifkDd2gI4c= X-Received: by 2002:a2e:302:: with SMTP id 2-v6mr13982901ljd.137.1551210552050; Tue, 26 Feb 2019 11:49:12 -0800 (PST) MIME-Version: 1.0 References: <4401A756-DA9E-4382-B949-80EE1C47E33F@uni-jena.de> <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> <3E82766A-7611-466D-94E5-063A9F42B9BD@uni-jena.de> In-Reply-To: <3E82766A-7611-466D-94E5-063A9F42B9BD@uni-jena.de> From: Bonnie MacKellar Date: Tue, 26 Feb 2019 14:49:13 -0500 Message-ID: Subject: Re: XML files as input to UIMA? To: user@uima.apache.org Content-Type: multipart/alternative; boundary="000000000000ed4c430582d159a0" --000000000000ed4c430582d159a0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable HI, Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I can't get jcore-ct-reader to build or generate types, although many of the other components do build. I am running an old version of UIMA - 2.81.1. Does this require a later version? thanks Bonnie MacKellar On Mon, Feb 25, 2019 at 8:37 AM Erik F=C3=A4=C3=9Fler wrote: > Dear Bonnie, > > please check out > https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader < > https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>. > > Please let me know if you have any questions or if you already decided to > go with one of the other approaches that have been proposed in the meanti= me > or something entirely different. > > Best, > > Erik > > > On 22. Feb 2019, at 13:17, Bonnie MacKellar > wrote: > > > > Thanks so much! > > > > Bonnie MacKellar > > > > On Fri, Feb 22, 2019 at 7:03 AM Erik F=C3=A4=C3=9Fler > > wrote: > > > >> Hey, > >> > >> just wanted to say that I didn=E2=80=99t come around to make the compo= nent > >> available yet, will do first thing next week! > >> > >> Best, > >> > >> Erik > >> > >>> On 20. Feb 2019, at 19:47, Bonnie MacKellar > >> wrote: > >>> > >>> Hi, > >>> Yes, we are using that format. I have a parser that I wrote, but it > isn't > >>> integrated into UIMA. It runs separately and loads the full clinical > >> trial > >>> data into a triplestore (Stardog). I would be interested in your syst= em > >>> since I am not really familiar with how to write file readers in the > UMIA > >>> framework. Perhaps I can merge my parser into it and end up with just > the > >>> right thing. If you can make it available, I would definitely be > >>> interested. I will take a look at the other links as well. Thanks!! > >>> > >>> Bonnie MacKellar > >>> > >>> On Wed, Feb 20, 2019 at 3:54 AM Erik F=C3=A4=C3=9Fler > > >>> wrote: > >>> > >>>> Dear Bonnie, > >>>> > >>>> are you talking about the clinical trial XML format used by > >>>> ClinicalTrials. gov by any chance? > >>>> If so, I did create a UIMA reader for these data. Its not perfect bu= t > >>>> perhaps enough for your purposes and also you might want to enhance > it. > >>>> Please let me know if you would be interested in that, I did not get > >>>> around to make it publicly available yet but could do so quickly. > >>>> > >>>> To answer the general question to the best of my knowledge: > >>>> There is no such thing as a general XML reader built-in into the UIM= A > >>>> framework. For all non-trivial formats, a specific reader is > necessary. > >>>> This also holds true with regard to the employed type system. > >>>> That being said, there are UIMA readers that try to serve as a gener= al > >> XML > >>>> reading facility, e.g. the =E2=80=9CXML Reader=E2=80=9D from our lab= (JULIELab, > >>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader = < > >>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader > >). > >>>> However, in my experience XML inputs come in a lot of different form= s > >>>> which might often not be suitable to a generic approach which is why= I > >>>> wrote quite a few UIMA readers for specific XML formats in the past. > >>>> > >>>> Hope that helps, > >>>> > >>>> Erik > >>>> > >>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar > >>>> wrote: > >>>>> > >>>>> This is probably a very naive question, but I can't seem to find > >> anything > >>>>> about this. I currently have a lot of XML files (clinical trial > >>>>> descriptions). My current workflow is to run a preprocessor that > parses > >>>> the > >>>>> XML and generates text files in a simple format. I then run these > files > >>>> in > >>>>> a UIMA pipeline, using FileCollectionReader to load the text files, > >> RUTA > >>>> to > >>>>> parse the simple format, the Metamap annotator to do some UMLS > >>>> annotations, > >>>>> and finally I have a writer that generates RDF triples from the UMI= A > >>>>> annotations and loads the triples into a database. This has worked > but > >> is > >>>>> clunky, especially the preprocessing. I feel like there has to be a > >>>> better > >>>>> way. Is there any support for reading XML files or do I need to > write > >> my > >>>>> own CollectionReader? Are there any other tools within UIMA for > >> handling > >>>>> XML text? > >>>>> > >>>>> thanks, > >>>>> Bonnie MacKellar > >>>> > >>>> > >> > >> > > --000000000000ed4c430582d159a0--