From user-return-8052-archive-asf-public=cust-asf.ponee.io@uima.apache.org Wed Feb 27 08:29:45 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A36ED180608 for ; Wed, 27 Feb 2019 09:29:44 +0100 (CET) Received: (qmail 92174 invoked by uid 500); 27 Feb 2019 08:29:38 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 92162 invoked by uid 99); 27 Feb 2019 08:29:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2019 08:29:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 630C7C2BC8 for ; Wed, 27 Feb 2019 08:29:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.395 X-Spam-Level: X-Spam-Status: No, score=-2.395 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FROM_EXCESS_BASE64=0.105, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=uni-jena.de Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id EZOBhAB3QdOV for ; Wed, 27 Feb 2019 08:29:35 +0000 (UTC) Received: from smtpout0.rz.uni-jena.de (smtpout0.rz.uni-jena.de [141.35.105.40]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 024BE5F35A for ; Wed, 27 Feb 2019 08:29:34 +0000 (UTC) Received: from smtpin2.rz.uni-jena.de (smtpin2.rz.uni-jena.de [141.35.104.42]) by smtpout0.rz.uni-jena.de (Postfix) with ESMTPS id 448TPN3PH5z8sZZ for ; Wed, 27 Feb 2019 09:29:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uni-jena.de; s=opendkim-2017; t=1551256168; bh=Ea1rvuvrqtVySCeRrAa/RYY6OfVXTi/aLegobXH8rgo=; h=From:Subject:Date:References:To:In-Reply-To:From; b=ETSc4v0EvLV6XY7yt/LWoPwAflwKgDBvKxJX5NG+LDDEJLU9sYLgWcyYCRMzxSGyV H3Afm/YQyJYLCHSjmM2/ZHHTIrpCnv9rjVIxE8OuggvaKc3ANJ/GqCwoBmpNDM9t7y B2ihrkpxxsN5G1/nVLGqT8hKVYiJa6BfV2szJKAs= Received: from [IPv6:2001:638:1558:e7c0:f9a1:1617:bb4f:420d] (unknown [IPv6:2001:638:1558:e7c0:f9a1:1617:bb4f:420d]) by smtpin2.rz.uni-jena.de (Postfix) with ESMTPSA id 448TPN2hFszGmQd for ; Wed, 27 Feb 2019 09:29:28 +0100 (CET) From: =?utf-8?B?RXJpayBGw6TDn2xlcg==?= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: XML files as input to UIMA? Date: Wed, 27 Feb 2019 09:29:27 +0100 References: <4401A756-DA9E-4382-B949-80EE1C47E33F@uni-jena.de> <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> <3E82766A-7611-466D-94E5-063A9F42B9BD@uni-jena.de> To: user@uima.apache.org In-Reply-To: Message-Id: <6325367E-88C6-4FA0-BFA3-D75BA8C6AFE3@uni-jena.de> X-Mailer: Apple Mail (2.3445.9.1) Dear Bonnie, oh, sorry about that. I tend to forget this: We store all our types in = the special jcore-types project. You need to build this project once. = You can just use a maven package command (mvn package) because we build = the types automatically through a Maven plugin. After that, the types should be available to all project. Note that this is not necessary to use the Maven artifacts that we have = already uploaded to Maven central. Those refer to the jcore-types maven = artifact which includes all built types. You should only have this issue due to Maven workspace resolution. Hope this helps, Erik > On 26. Feb 2019, at 20:49, Bonnie MacKellar = wrote: >=20 > HI, >=20 > Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I = can't > get jcore-ct-reader to build or generate types, although many of the = other > components do build. I am running an old version of UIMA - 2.81.1. = Does > this require a later version? >=20 > thanks > Bonnie MacKellar >=20 > On Mon, Feb 25, 2019 at 8:37 AM Erik F=C3=A4=C3=9Fler = > wrote: >=20 >> Dear Bonnie, >>=20 >> please check out >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader < >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>. >>=20 >> Please let me know if you have any questions or if you already = decided to >> go with one of the other approaches that have been proposed in the = meantime >> or something entirely different. >>=20 >> Best, >>=20 >> Erik >>=20 >>> On 22. Feb 2019, at 13:17, Bonnie MacKellar >> wrote: >>>=20 >>> Thanks so much! >>>=20 >>> Bonnie MacKellar >>>=20 >>> On Fri, Feb 22, 2019 at 7:03 AM Erik F=C3=A4=C3=9Fler = >>> wrote: >>>=20 >>>> Hey, >>>>=20 >>>> just wanted to say that I didn=E2=80=99t come around to make the = component >>>> available yet, will do first thing next week! >>>>=20 >>>> Best, >>>>=20 >>>> Erik >>>>=20 >>>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar = >>>> wrote: >>>>>=20 >>>>> Hi, >>>>> Yes, we are using that format. I have a parser that I wrote, but = it >> isn't >>>>> integrated into UIMA. It runs separately and loads the full = clinical >>>> trial >>>>> data into a triplestore (Stardog). I would be interested in your = system >>>>> since I am not really familiar with how to write file readers in = the >> UMIA >>>>> framework. Perhaps I can merge my parser into it and end up with = just >> the >>>>> right thing. If you can make it available, I would definitely be >>>>> interested. I will take a look at the other links as well. = Thanks!! >>>>>=20 >>>>> Bonnie MacKellar >>>>>=20 >>>>> On Wed, Feb 20, 2019 at 3:54 AM Erik F=C3=A4=C3=9Fler = >>=20 >>>>> wrote: >>>>>=20 >>>>>> Dear Bonnie, >>>>>>=20 >>>>>> are you talking about the clinical trial XML format used by >>>>>> ClinicalTrials. gov by any chance? >>>>>> If so, I did create a UIMA reader for these data. Its not perfect = but >>>>>> perhaps enough for your purposes and also you might want to = enhance >> it. >>>>>> Please let me know if you would be interested in that, I did not = get >>>>>> around to make it publicly available yet but could do so quickly. >>>>>>=20 >>>>>> To answer the general question to the best of my knowledge: >>>>>> There is no such thing as a general XML reader built-in into the = UIMA >>>>>> framework. For all non-trivial formats, a specific reader is >> necessary. >>>>>> This also holds true with regard to the employed type system. >>>>>> That being said, there are UIMA readers that try to serve as a = general >>>> XML >>>>>> reading facility, e.g. the =E2=80=9CXML Reader=E2=80=9D from our = lab (JULIELab, >>>>>> = https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader < >>>>>> = https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader >>> ). >>>>>> However, in my experience XML inputs come in a lot of different = forms >>>>>> which might often not be suitable to a generic approach which is = why I >>>>>> wrote quite a few UIMA readers for specific XML formats in the = past. >>>>>>=20 >>>>>> Hope that helps, >>>>>>=20 >>>>>> Erik >>>>>>=20 >>>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar = >>>>>> wrote: >>>>>>>=20 >>>>>>> This is probably a very naive question, but I can't seem to find >>>> anything >>>>>>> about this. I currently have a lot of XML files (clinical trial >>>>>>> descriptions). My current workflow is to run a preprocessor that >> parses >>>>>> the >>>>>>> XML and generates text files in a simple format. I then run = these >> files >>>>>> in >>>>>>> a UIMA pipeline, using FileCollectionReader to load the text = files, >>>> RUTA >>>>>> to >>>>>>> parse the simple format, the Metamap annotator to do some UMLS >>>>>> annotations, >>>>>>> and finally I have a writer that generates RDF triples from the = UMIA >>>>>>> annotations and loads the triples into a database. This has = worked >> but >>>> is >>>>>>> clunky, especially the preprocessing. I feel like there has to = be a >>>>>> better >>>>>>> way. Is there any support for reading XML files or do I need to >> write >>>> my >>>>>>> own CollectionReader? Are there any other tools within UIMA for >>>> handling >>>>>>> XML text? >>>>>>>=20 >>>>>>> thanks, >>>>>>> Bonnie MacKellar >>>>>>=20 >>>>>>=20 >>>>=20 >>>>=20 >>=20 >>=20