uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: XML files as input to UIMA?
Date Wed, 20 Feb 2019 08:54:14 GMT
Dear Bonnie,

are you talking about the clinical trial XML format used by ClinicalTrials. <http://clinicaltrials.org/>gov
by any chance?
If so, I did create a UIMA reader for these data. Its not perfect but perhaps enough for your
purposes and also you might want to enhance it.
Please let me know if you would be interested in that, I did not get around to make it publicly
available yet but could do so quickly.

To answer the general question to the best of my knowledge:
There is no such thing as a general XML reader built-in into the UIMA framework. For all non-trivial
formats, a specific reader is necessary. This also holds true with regard to the employed
type system.
That being said, there are UIMA readers that try to serve as a general XML reading facility,
e.g. the “XML Reader” from our lab (JULIELab, https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
<https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>).
However, in my experience XML inputs come in a lot of different forms which might often not
be suitable to a generic approach which is why I wrote quite a few UIMA readers for specific
XML formats in the past.

Hope that helps,

Erik

> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com> wrote:
> 
> This is probably a very naive question, but I can't seem to find anything
> about this. I currently have a lot of XML files (clinical trial
> descriptions). My current workflow is to run a preprocessor that parses the
> XML and generates text files in a simple format. I then run these files in
> a UIMA pipeline, using FileCollectionReader to load the text files, RUTA to
> parse the simple format, the Metamap annotator to do some UMLS annotations,
> and finally I have a writer that generates RDF triples from the UMIA
> annotations and loads the triples into a database. This has worked but is
> clunky, especially the preprocessing. I feel like there has to be a better
> way. Is there any support for reading XML files  or do I need to write my
> own CollectionReader? Are there any other tools within UIMA for handling
> XML text?
> 
> thanks,
> Bonnie MacKellar


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message