From user-return-8050-archive-asf-public=cust-asf.ponee.io@uima.apache.org Mon Feb 25 13:37:36 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0EA68180626 for ; Mon, 25 Feb 2019 14:37:35 +0100 (CET) Received: (qmail 95609 invoked by uid 500); 25 Feb 2019 13:37:35 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 95597 invoked by uid 99); 25 Feb 2019 13:37:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2019 13:37:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C6025182DC1 for ; Mon, 25 Feb 2019 13:37:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.395 X-Spam-Level: X-Spam-Status: No, score=-0.395 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FROM_EXCESS_BASE64=0.105, HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=uni-jena.de Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eHffzvX91sr5 for ; Mon, 25 Feb 2019 13:37:31 +0000 (UTC) Received: from smtpout1.rz.uni-jena.de (smtpout1.rz.uni-jena.de [141.35.105.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E52AC61AF5 for ; Mon, 25 Feb 2019 13:37:30 +0000 (UTC) Received: from smtpin1.rz.uni-jena.de (smtpin1.rz.uni-jena.de [141.35.104.41]) by smtpout1.rz.uni-jena.de (Postfix) with ESMTPS id 447NKc4ZLtzFn95 for ; Mon, 25 Feb 2019 14:37:24 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uni-jena.de; s=opendkim-2017; t=1551101844; bh=CLHkWRBX2Kc2t/b/24MumHoAMi3y3hRFcrqj7Qm2AOM=; h=From:Subject:Date:References:To:In-Reply-To:From; b=Wi5Menb1Hs5cx3xMITdJQ2ogqbU27VorMa4sMC+7Wm0XZF5fyq+1CbstJmDGp/hOJ S+e6JixNiZS1vBpy4lWiZ+qfICsbZ7ReHm7oSIS/87VEs+n4dRRyyWZSVd35sK7NGL 9qC9yBp6gMEDyzMRve05iUrKzrTzKio+CyVBEsOQ= Received: from [IPv6:2001:638:1558:e7c0:e4e8:209:b17d:99f6] (unknown [IPv6:2001:638:1558:e7c0:e4e8:209:b17d:99f6]) by smtpin1.rz.uni-jena.de (Postfix) with ESMTPSA id 447NKc3c9vz5tt3 for ; Mon, 25 Feb 2019 14:37:24 +0100 (CET) From: =?utf-8?B?RXJpayBGw6TDn2xlcg==?= Content-Type: multipart/alternative; boundary="Apple-Mail=_15A7C3E5-0D77-46DE-9E8C-3B6E8D952070" Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: XML files as input to UIMA? Date: Mon, 25 Feb 2019 14:37:24 +0100 References: <4401A756-DA9E-4382-B949-80EE1C47E33F@uni-jena.de> <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> To: user@uima.apache.org In-Reply-To: Message-Id: <3E82766A-7611-466D-94E5-063A9F42B9BD@uni-jena.de> X-Mailer: Apple Mail (2.3445.9.1) --Apple-Mail=_15A7C3E5-0D77-46DE-9E8C-3B6E8D952070 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Dear Bonnie, please check out = https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader = . Please let me know if you have any questions or if you already decided = to go with one of the other approaches that have been proposed in the = meantime or something entirely different. Best, Erik > On 22. Feb 2019, at 13:17, Bonnie MacKellar = wrote: >=20 > Thanks so much! >=20 > Bonnie MacKellar >=20 > On Fri, Feb 22, 2019 at 7:03 AM Erik F=C3=A4=C3=9Fler = > wrote: >=20 >> Hey, >>=20 >> just wanted to say that I didn=E2=80=99t come around to make the = component >> available yet, will do first thing next week! >>=20 >> Best, >>=20 >> Erik >>=20 >>> On 20. Feb 2019, at 19:47, Bonnie MacKellar >> wrote: >>>=20 >>> Hi, >>> Yes, we are using that format. I have a parser that I wrote, but it = isn't >>> integrated into UIMA. It runs separately and loads the full clinical >> trial >>> data into a triplestore (Stardog). I would be interested in your = system >>> since I am not really familiar with how to write file readers in the = UMIA >>> framework. Perhaps I can merge my parser into it and end up with = just the >>> right thing. If you can make it available, I would definitely be >>> interested. I will take a look at the other links as well. = Thanks!! >>>=20 >>> Bonnie MacKellar >>>=20 >>> On Wed, Feb 20, 2019 at 3:54 AM Erik F=C3=A4=C3=9Fler = >>> wrote: >>>=20 >>>> Dear Bonnie, >>>>=20 >>>> are you talking about the clinical trial XML format used by >>>> ClinicalTrials. gov by any chance? >>>> If so, I did create a UIMA reader for these data. Its not perfect = but >>>> perhaps enough for your purposes and also you might want to enhance = it. >>>> Please let me know if you would be interested in that, I did not = get >>>> around to make it publicly available yet but could do so quickly. >>>>=20 >>>> To answer the general question to the best of my knowledge: >>>> There is no such thing as a general XML reader built-in into the = UIMA >>>> framework. For all non-trivial formats, a specific reader is = necessary. >>>> This also holds true with regard to the employed type system. >>>> That being said, there are UIMA readers that try to serve as a = general >> XML >>>> reading facility, e.g. the =E2=80=9CXML Reader=E2=80=9D from our = lab (JULIELab, >>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader = < >>>> = https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>). >>>> However, in my experience XML inputs come in a lot of different = forms >>>> which might often not be suitable to a generic approach which is = why I >>>> wrote quite a few UIMA readers for specific XML formats in the = past. >>>>=20 >>>> Hope that helps, >>>>=20 >>>> Erik >>>>=20 >>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar = >>>> wrote: >>>>>=20 >>>>> This is probably a very naive question, but I can't seem to find >> anything >>>>> about this. I currently have a lot of XML files (clinical trial >>>>> descriptions). My current workflow is to run a preprocessor that = parses >>>> the >>>>> XML and generates text files in a simple format. I then run these = files >>>> in >>>>> a UIMA pipeline, using FileCollectionReader to load the text = files, >> RUTA >>>> to >>>>> parse the simple format, the Metamap annotator to do some UMLS >>>> annotations, >>>>> and finally I have a writer that generates RDF triples from the = UMIA >>>>> annotations and loads the triples into a database. This has worked = but >> is >>>>> clunky, especially the preprocessing. I feel like there has to be = a >>>> better >>>>> way. Is there any support for reading XML files or do I need to = write >> my >>>>> own CollectionReader? Are there any other tools within UIMA for >> handling >>>>> XML text? >>>>>=20 >>>>> thanks, >>>>> Bonnie MacKellar >>>>=20 >>>>=20 >>=20 >>=20 --Apple-Mail=_15A7C3E5-0D77-46DE-9E8C-3B6E8D952070--