uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: XML files as input to UIMA?
Date Wed, 20 Feb 2019 18:47:55 GMT
Hi,
Yes, we are using that format. I have a parser that I wrote, but it isn't
integrated into UIMA. It runs separately and loads the full clinical trial
data into a triplestore (Stardog). I would be interested in your system
since I am not really familiar with how to write file readers in the UMIA
framework. Perhaps I can merge my parser into it and end up with just the
right thing. If you can make it available, I would definitely be
interested.  I will take a look at the other links as well.  Thanks!!

Bonnie MacKellar

On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faessler@uni-jena.de>
wrote:

> Dear Bonnie,
>
> are you talking about the clinical trial XML format used by
> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
> If so, I did create a UIMA reader for these data. Its not perfect but
> perhaps enough for your purposes and also you might want to enhance it.
> Please let me know if you would be interested in that, I did not get
> around to make it publicly available yet but could do so quickly.
>
> To answer the general question to the best of my knowledge:
> There is no such thing as a general XML reader built-in into the UIMA
> framework. For all non-trivial formats, a specific reader is necessary.
> This also holds true with regard to the employed type system.
> That being said, there are UIMA readers that try to serve as a general XML
> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader <
> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>).
> However, in my experience XML inputs come in a lot of different forms
> which might often not be suitable to a generic approach which is why I
> wrote quite a few UIMA readers for specific XML formats in the past.
>
> Hope that helps,
>
> Erik
>
> > On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com>
> wrote:
> >
> > This is probably a very naive question, but I can't seem to find anything
> > about this. I currently have a lot of XML files (clinical trial
> > descriptions). My current workflow is to run a preprocessor that parses
> the
> > XML and generates text files in a simple format. I then run these files
> in
> > a UIMA pipeline, using FileCollectionReader to load the text files, RUTA
> to
> > parse the simple format, the Metamap annotator to do some UMLS
> annotations,
> > and finally I have a writer that generates RDF triples from the UMIA
> > annotations and loads the triples into a database. This has worked but is
> > clunky, especially the preprocessing. I feel like there has to be a
> better
> > way. Is there any support for reading XML files  or do I need to write my
> > own CollectionReader? Are there any other tools within UIMA for handling
> > XML text?
> >
> > thanks,
> > Bonnie MacKellar
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message