uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: XML files as input to UIMA?
Date Wed, 27 Feb 2019 13:11:00 GMT
Hi,

Thanks. I had actually figured that out yesterday after visiting the
project website and reading carefully. The problem, though, is that
jcore-ct-reader still does not build. It can't find the clinical trial
types even after they have been generated. The mvn package step correctly
generates the jar file, but places it in a target folder (don't have the
exact path because I am on another computer) which isn't seen by
jcore-ct-reader.  There is another jar file generated by jcore-types, which
ends up in my maven repository, and I *think* that is the one
jcore-ct-reader picks up. But it doesn't contain the clinical trial types
for some reason. So I just get a lot of error messages when I try to build
jcore-ct-reader saying it can't find those classes.

The project looks interesting. My parser gets more fields from the trials,
and we handle inclusion/exclusion text differently because part of our
pipeline is to parse those sentences and annotate them in varius ways. What
we had been doing was to use the clinical trial XML parser that I had built
to 1) insert rdf triples into a triplestore,   2) generate a text
representation of the inclusion/exclusion constraints that is fed through
the UIMA pipeline. The output of that process is also placed in the
triplestore. One of the things I am looking at doing is unifying this
better so that it is all one process. Looking at your reader, I can see how
it all could work. Nice!

Bonnie MacKellar

On Wed, Feb 27, 2019 at 3:29 AM Erik Fäßler <erik.faessler@uni-jena.de>
wrote:

> Dear Bonnie,
>
> oh, sorry about that. I tend to forget this: We store all our types in the
> special jcore-types project. You need to build this project once. You can
> just use a maven package command (mvn package) because we build the types
> automatically through a Maven plugin.
> After that, the types should be available to all project.
>
> Note that this is not necessary to use the Maven artifacts that we have
> already uploaded to Maven central. Those refer to the jcore-types maven
> artifact which includes all built types.
>
> You should only have this issue due to Maven workspace resolution.
>
> Hope this helps,
>
> Erik
>
> > On 26. Feb 2019, at 20:49, Bonnie MacKellar <bkmackellar@gmail.com>
> wrote:
> >
> > HI,
> >
> > Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I
> can't
> > get jcore-ct-reader to build or generate types, although many of the
> other
> > components do build. I am running an old version of UIMA - 2.81.1. Does
> > this require a later version?
> >
> > thanks
> > Bonnie MacKellar
> >
> > On Mon, Feb 25, 2019 at 8:37 AM Erik Fäßler <erik.faessler@uni-jena.de>
> > wrote:
> >
> >> Dear Bonnie,
> >>
> >> please check out
> >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader <
> >> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>.
> >>
> >> Please let me know if you have any questions or if you already decided
> to
> >> go with one of the other approaches that have been proposed in the
> meantime
> >> or something entirely different.
> >>
> >> Best,
> >>
> >> Erik
> >>
> >>> On 22. Feb 2019, at 13:17, Bonnie MacKellar <bkmackellar@gmail.com>
> >> wrote:
> >>>
> >>> Thanks so much!
> >>>
> >>> Bonnie MacKellar
> >>>
> >>> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de
> >
> >>> wrote:
> >>>
> >>>> Hey,
> >>>>
> >>>> just wanted to say that I didn’t come around to make the component
> >>>> available yet, will do first thing next week!
> >>>>
> >>>> Best,
> >>>>
> >>>> Erik
> >>>>
> >>>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> Yes, we are using that format. I have a parser that I wrote, but
it
> >> isn't
> >>>>> integrated into UIMA. It runs separately and loads the full clinical
> >>>> trial
> >>>>> data into a triplestore (Stardog). I would be interested in your
> system
> >>>>> since I am not really familiar with how to write file readers in
the
> >> UMIA
> >>>>> framework. Perhaps I can merge my parser into it and end up with
just
> >> the
> >>>>> right thing. If you can make it available, I would definitely be
> >>>>> interested.  I will take a look at the other links as well.  Thanks!!
> >>>>>
> >>>>> Bonnie MacKellar
> >>>>>
> >>>>> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <
> erik.faessler@uni-jena.de
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Dear Bonnie,
> >>>>>>
> >>>>>> are you talking about the clinical trial XML format used by
> >>>>>> ClinicalTrials. <http://clinicaltrials.org/>gov by any
chance?
> >>>>>> If so, I did create a UIMA reader for these data. Its not perfect
> but
> >>>>>> perhaps enough for your purposes and also you might want to
enhance
> >> it.
> >>>>>> Please let me know if you would be interested in that, I did
not get
> >>>>>> around to make it publicly available yet but could do so quickly.
> >>>>>>
> >>>>>> To answer the general question to the best of my knowledge:
> >>>>>> There is no such thing as a general XML reader built-in into
the
> UIMA
> >>>>>> framework. For all non-trivial formats, a specific reader is
> >> necessary.
> >>>>>> This also holds true with regard to the employed type system.
> >>>>>> That being said, there are UIMA readers that try to serve as
a
> general
> >>>> XML
> >>>>>> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
> >>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
> <
> >>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
> >>> ).
> >>>>>> However, in my experience XML inputs come in a lot of different
> forms
> >>>>>> which might often not be suitable to a generic approach which
is
> why I
> >>>>>> wrote quite a few UIMA readers for specific XML formats in the
past.
> >>>>>>
> >>>>>> Hope that helps,
> >>>>>>
> >>>>>> Erik
> >>>>>>
> >>>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> This is probably a very naive question, but I can't seem
to find
> >>>> anything
> >>>>>>> about this. I currently have a lot of XML files (clinical
trial
> >>>>>>> descriptions). My current workflow is to run a preprocessor
that
> >> parses
> >>>>>> the
> >>>>>>> XML and generates text files in a simple format. I then
run these
> >> files
> >>>>>> in
> >>>>>>> a UIMA pipeline, using FileCollectionReader to load the
text files,
> >>>> RUTA
> >>>>>> to
> >>>>>>> parse the simple format, the Metamap annotator to do some
UMLS
> >>>>>> annotations,
> >>>>>>> and finally I have a writer that generates RDF triples from
the
> UMIA
> >>>>>>> annotations and loads the triples into a database. This
has worked
> >> but
> >>>> is
> >>>>>>> clunky, especially the preprocessing. I feel like there
has to be a
> >>>>>> better
> >>>>>>> way. Is there any support for reading XML files  or do I
need to
> >> write
> >>>> my
> >>>>>>> own CollectionReader? Are there any other tools within UIMA
for
> >>>> handling
> >>>>>>> XML text?
> >>>>>>>
> >>>>>>> thanks,
> >>>>>>> Bonnie MacKellar
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message