uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: XML files as input to UIMA?
Date Tue, 26 Feb 2019 19:49:13 GMT
HI,

Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I can't
get jcore-ct-reader to build or generate types, although many of the other
components do build. I am running an old version of UIMA - 2.81.1. Does
this require a later version?

thanks
Bonnie MacKellar

On Mon, Feb 25, 2019 at 8:37 AM Erik Fäßler <erik.faessler@uni-jena.de>
wrote:

> Dear Bonnie,
>
> please check out
> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader <
> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>.
>
> Please let me know if you have any questions or if you already decided to
> go with one of the other approaches that have been proposed in the meantime
> or something entirely different.
>
> Best,
>
> Erik
>
> > On 22. Feb 2019, at 13:17, Bonnie MacKellar <bkmackellar@gmail.com>
> wrote:
> >
> > Thanks so much!
> >
> > Bonnie MacKellar
> >
> > On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de>
> > wrote:
> >
> >> Hey,
> >>
> >> just wanted to say that I didn’t come around to make the component
> >> available yet, will do first thing next week!
> >>
> >> Best,
> >>
> >> Erik
> >>
> >>> On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> Yes, we are using that format. I have a parser that I wrote, but it
> isn't
> >>> integrated into UIMA. It runs separately and loads the full clinical
> >> trial
> >>> data into a triplestore (Stardog). I would be interested in your system
> >>> since I am not really familiar with how to write file readers in the
> UMIA
> >>> framework. Perhaps I can merge my parser into it and end up with just
> the
> >>> right thing. If you can make it available, I would definitely be
> >>> interested.  I will take a look at the other links as well.  Thanks!!
> >>>
> >>> Bonnie MacKellar
> >>>
> >>> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faessler@uni-jena.de
> >
> >>> wrote:
> >>>
> >>>> Dear Bonnie,
> >>>>
> >>>> are you talking about the clinical trial XML format used by
> >>>> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
> >>>> If so, I did create a UIMA reader for these data. Its not perfect but
> >>>> perhaps enough for your purposes and also you might want to enhance
> it.
> >>>> Please let me know if you would be interested in that, I did not get
> >>>> around to make it publicly available yet but could do so quickly.
> >>>>
> >>>> To answer the general question to the best of my knowledge:
> >>>> There is no such thing as a general XML reader built-in into the UIMA
> >>>> framework. For all non-trivial formats, a specific reader is
> necessary.
> >>>> This also holds true with regard to the employed type system.
> >>>> That being said, there are UIMA readers that try to serve as a general
> >> XML
> >>>> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
> >>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
<
> >>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
> >).
> >>>> However, in my experience XML inputs come in a lot of different forms
> >>>> which might often not be suitable to a generic approach which is why
I
> >>>> wrote quite a few UIMA readers for specific XML formats in the past.
> >>>>
> >>>> Hope that helps,
> >>>>
> >>>> Erik
> >>>>
> >>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> This is probably a very naive question, but I can't seem to find
> >> anything
> >>>>> about this. I currently have a lot of XML files (clinical trial
> >>>>> descriptions). My current workflow is to run a preprocessor that
> parses
> >>>> the
> >>>>> XML and generates text files in a simple format. I then run these
> files
> >>>> in
> >>>>> a UIMA pipeline, using FileCollectionReader to load the text files,
> >> RUTA
> >>>> to
> >>>>> parse the simple format, the Metamap annotator to do some UMLS
> >>>> annotations,
> >>>>> and finally I have a writer that generates RDF triples from the
UMIA
> >>>>> annotations and loads the triples into a database. This has worked
> but
> >> is
> >>>>> clunky, especially the preprocessing. I feel like there has to be
a
> >>>> better
> >>>>> way. Is there any support for reading XML files  or do I need to
> write
> >> my
> >>>>> own CollectionReader? Are there any other tools within UIMA for
> >> handling
> >>>>> XML text?
> >>>>>
> >>>>> thanks,
> >>>>> Bonnie MacKellar
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message