From user-return-8046-archive-asf-public=cust-asf.ponee.io@uima.apache.org Fri Feb 22 12:17:24 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id F3E19180648 for ; Fri, 22 Feb 2019 13:17:23 +0100 (CET) Received: (qmail 53223 invoked by uid 500); 22 Feb 2019 12:17:23 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 53175 invoked by uid 99); 22 Feb 2019 12:17:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2019 12:17:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E2A4B18290A for ; Fri, 22 Feb 2019 12:17:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 62GTonzlif5J for ; Fri, 22 Feb 2019 12:17:19 +0000 (UTC) Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 62FDB61143 for ; Fri, 22 Feb 2019 12:17:19 +0000 (UTC) Received: by mail-lf1-f42.google.com with SMTP id m11so1547486lfc.6 for ; Fri, 22 Feb 2019 04:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=aIX1ctBgyt2EQWNA+Lwf36bXRJAABXBrT12v5NpjLWg=; b=utukURwMzEbry0hpqeRmpL15ru65u14QqibQCilhLioMZ7uWtp8taoaq4PSqWyUF/0 6mdp2SKCHKnmblx1NkS3dWkUOf+8zeZjvjoxkjtnSc+5gCznW13Pgn/sf8uR6F2FIKjE K2Jp5gZVYfBpojCYFxWWQ548F1WmH4ioIMcaAvHqdRciX0NFg4ILl5vA8mtK5WbEVFeQ 0wbrLBCyYl+JKJbGzJ+/4HkKGdUBFDGQFh3+8y0Bn+7fVjvN2SnR1WqjDb7Ld//JK/iu ZJ2liUilecpdAxyytBnaU7/XPhXycg7njK/mJeaN0Jskm4+s+7IljM4yCzIRUf2lNe0v Lebg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=aIX1ctBgyt2EQWNA+Lwf36bXRJAABXBrT12v5NpjLWg=; b=A1L7BRldIOgCuUS0t6hqIQBuI+I/VPjALCFyq214mAakO+L3PzpJ/d6R3RjR1hB17x Yf6u63NcH1DAobE8tOxdKMhU19XrT+eHaBL9EmVeIKDyYK2FNBnWl53CGtxQdfYZ8R8e 0zvs+0UKB2sFxeEs6vBBpaHDN+WS6kg1DAvewd0ZCzTarG+8lnKTxwGFIUPIMpYEgs/w viIsfBex7jlGfKQTN43lYNKCmwpWdFDTd1StiFAEbEZawm+uD1pN5WMJ4Z1nAJMAVNyk fOYcyWMjYLiCxGvXVx5fZbX7dGgaB24GYfzRfW4CDpuL3zk0qrbVkNgUWni/40+FfZsC bidA== X-Gm-Message-State: AHQUAuYl6j3fraZoC2lSfkPQ5DfWSnKnBpoVAz+Rd+PlUV3mwSdVYwxn dtvXYVKnolKIr/SJXW7MQEzdAaFJj4HZIwb5M0+1wg== X-Google-Smtp-Source: AHgI3IYoTBC17Nf1B6vRgQytaR1ffGj+8TBvlQmhfMNb//eiFeAGj7zfiz5gGFjpZLaMM9P5jlUwK/DGu5EMNnwq7vQ= X-Received: by 2002:a19:f704:: with SMTP id z4mr2280878lfe.10.1550837837315; Fri, 22 Feb 2019 04:17:17 -0800 (PST) MIME-Version: 1.0 References: <4401A756-DA9E-4382-B949-80EE1C47E33F@uni-jena.de> <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> In-Reply-To: <65BF3144-0355-4094-99BD-2F7547B868C0@uni-jena.de> From: Bonnie MacKellar Date: Fri, 22 Feb 2019 07:17:05 -0500 Message-ID: Subject: Re: XML files as input to UIMA? To: user@uima.apache.org Content-Type: multipart/alternative; boundary="00000000000065c4fc05827a92de" --00000000000065c4fc05827a92de Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks so much! Bonnie MacKellar On Fri, Feb 22, 2019 at 7:03 AM Erik F=C3=A4=C3=9Fler wrote: > Hey, > > just wanted to say that I didn=E2=80=99t come around to make the componen= t > available yet, will do first thing next week! > > Best, > > Erik > > > On 20. Feb 2019, at 19:47, Bonnie MacKellar > wrote: > > > > Hi, > > Yes, we are using that format. I have a parser that I wrote, but it isn= 't > > integrated into UIMA. It runs separately and loads the full clinical > trial > > data into a triplestore (Stardog). I would be interested in your system > > since I am not really familiar with how to write file readers in the UM= IA > > framework. Perhaps I can merge my parser into it and end up with just t= he > > right thing. If you can make it available, I would definitely be > > interested. I will take a look at the other links as well. Thanks!! > > > > Bonnie MacKellar > > > > On Wed, Feb 20, 2019 at 3:54 AM Erik F=C3=A4=C3=9Fler > > wrote: > > > >> Dear Bonnie, > >> > >> are you talking about the clinical trial XML format used by > >> ClinicalTrials. gov by any chance? > >> If so, I did create a UIMA reader for these data. Its not perfect but > >> perhaps enough for your purposes and also you might want to enhance it= . > >> Please let me know if you would be interested in that, I did not get > >> around to make it publicly available yet but could do so quickly. > >> > >> To answer the general question to the best of my knowledge: > >> There is no such thing as a general XML reader built-in into the UIMA > >> framework. For all non-trivial formats, a specific reader is necessary= . > >> This also holds true with regard to the employed type system. > >> That being said, there are UIMA readers that try to serve as a general > XML > >> reading facility, e.g. the =E2=80=9CXML Reader=E2=80=9D from our lab (= JULIELab, > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader < > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>). > >> However, in my experience XML inputs come in a lot of different forms > >> which might often not be suitable to a generic approach which is why I > >> wrote quite a few UIMA readers for specific XML formats in the past. > >> > >> Hope that helps, > >> > >> Erik > >> > >>> On 20. Feb 2019, at 01:13, Bonnie MacKellar > >> wrote: > >>> > >>> This is probably a very naive question, but I can't seem to find > anything > >>> about this. I currently have a lot of XML files (clinical trial > >>> descriptions). My current workflow is to run a preprocessor that pars= es > >> the > >>> XML and generates text files in a simple format. I then run these fil= es > >> in > >>> a UIMA pipeline, using FileCollectionReader to load the text files, > RUTA > >> to > >>> parse the simple format, the Metamap annotator to do some UMLS > >> annotations, > >>> and finally I have a writer that generates RDF triples from the UMIA > >>> annotations and loads the triples into a database. This has worked bu= t > is > >>> clunky, especially the preprocessing. I feel like there has to be a > >> better > >>> way. Is there any support for reading XML files or do I need to writ= e > my > >>> own CollectionReader? Are there any other tools within UIMA for > handling > >>> XML text? > >>> > >>> thanks, > >>> Bonnie MacKellar > >> > >> > > --00000000000065c4fc05827a92de--