uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Holland <benedict.m.holl...@gmail.com>
Subject Re: XML files as input to UIMA?
Date Fri, 22 Feb 2019 15:57:50 GMT
Is it worth while to make a conversion script from XML to something easier
like a database or json object?

Thanks,
~Ben

On Fri, Feb 22, 2019, 7:17 AM Bonnie MacKellar <bkmackellar@gmail.com>
wrote:

> Thanks so much!
>
> Bonnie MacKellar
>
> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de>
> wrote:
>
> > Hey,
> >
> > just wanted to say that I didn’t come around to make the component
> > available yet, will do first thing next week!
> >
> > Best,
> >
> > Erik
> >
> > > On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
> > wrote:
> > >
> > > Hi,
> > > Yes, we are using that format. I have a parser that I wrote, but it
> isn't
> > > integrated into UIMA. It runs separately and loads the full clinical
> > trial
> > > data into a triplestore (Stardog). I would be interested in your system
> > > since I am not really familiar with how to write file readers in the
> UMIA
> > > framework. Perhaps I can merge my parser into it and end up with just
> the
> > > right thing. If you can make it available, I would definitely be
> > > interested.  I will take a look at the other links as well.  Thanks!!
> > >
> > > Bonnie MacKellar
> > >
> > > On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faessler@uni-jena.de
> >
> > > wrote:
> > >
> > >> Dear Bonnie,
> > >>
> > >> are you talking about the clinical trial XML format used by
> > >> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
> > >> If so, I did create a UIMA reader for these data. Its not perfect but
> > >> perhaps enough for your purposes and also you might want to enhance
> it.
> > >> Please let me know if you would be interested in that, I did not get
> > >> around to make it publicly available yet but could do so quickly.
> > >>
> > >> To answer the general question to the best of my knowledge:
> > >> There is no such thing as a general XML reader built-in into the UIMA
> > >> framework. For all non-trivial formats, a specific reader is
> necessary.
> > >> This also holds true with regard to the employed type system.
> > >> That being said, there are UIMA readers that try to serve as a general
> > XML
> > >> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
> > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader <
> > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
> >).
> > >> However, in my experience XML inputs come in a lot of different forms
> > >> which might often not be suitable to a generic approach which is why I
> > >> wrote quite a few UIMA readers for specific XML formats in the past.
> > >>
> > >> Hope that helps,
> > >>
> > >> Erik
> > >>
> > >>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com>
> > >> wrote:
> > >>>
> > >>> This is probably a very naive question, but I can't seem to find
> > anything
> > >>> about this. I currently have a lot of XML files (clinical trial
> > >>> descriptions). My current workflow is to run a preprocessor that
> parses
> > >> the
> > >>> XML and generates text files in a simple format. I then run these
> files
> > >> in
> > >>> a UIMA pipeline, using FileCollectionReader to load the text files,
> > RUTA
> > >> to
> > >>> parse the simple format, the Metamap annotator to do some UMLS
> > >> annotations,
> > >>> and finally I have a writer that generates RDF triples from the UMIA
> > >>> annotations and loads the triples into a database. This has worked
> but
> > is
> > >>> clunky, especially the preprocessing. I feel like there has to be a
> > >> better
> > >>> way. Is there any support for reading XML files  or do I need to
> write
> > my
> > >>> own CollectionReader? Are there any other tools within UIMA for
> > handling
> > >>> XML text?
> > >>>
> > >>> thanks,
> > >>> Bonnie MacKellar
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message