uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: XML files as input to UIMA?
Date Mon, 25 Feb 2019 13:37:24 GMT
Dear Bonnie,

please check out https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader <https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>.

Please let me know if you have any questions or if you already decided to go with one of the
other approaches that have been proposed in the meantime or something entirely different.

Best,

Erik

> On 22. Feb 2019, at 13:17, Bonnie MacKellar <bkmackellar@gmail.com> wrote:
> 
> Thanks so much!
> 
> Bonnie MacKellar
> 
> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de>
> wrote:
> 
>> Hey,
>> 
>> just wanted to say that I didn’t come around to make the component
>> available yet, will do first thing next week!
>> 
>> Best,
>> 
>> Erik
>> 
>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> Yes, we are using that format. I have a parser that I wrote, but it isn't
>>> integrated into UIMA. It runs separately and loads the full clinical
>> trial
>>> data into a triplestore (Stardog). I would be interested in your system
>>> since I am not really familiar with how to write file readers in the UMIA
>>> framework. Perhaps I can merge my parser into it and end up with just the
>>> right thing. If you can make it available, I would definitely be
>>> interested.  I will take a look at the other links as well.  Thanks!!
>>> 
>>> Bonnie MacKellar
>>> 
>>> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faessler@uni-jena.de>
>>> wrote:
>>> 
>>>> Dear Bonnie,
>>>> 
>>>> are you talking about the clinical trial XML format used by
>>>> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
>>>> If so, I did create a UIMA reader for these data. Its not perfect but
>>>> perhaps enough for your purposes and also you might want to enhance it.
>>>> Please let me know if you would be interested in that, I did not get
>>>> around to make it publicly available yet but could do so quickly.
>>>> 
>>>> To answer the general question to the best of my knowledge:
>>>> There is no such thing as a general XML reader built-in into the UIMA
>>>> framework. For all non-trivial formats, a specific reader is necessary.
>>>> This also holds true with regard to the employed type system.
>>>> That being said, there are UIMA readers that try to serve as a general
>> XML
>>>> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader <
>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>).
>>>> However, in my experience XML inputs come in a lot of different forms
>>>> which might often not be suitable to a generic approach which is why I
>>>> wrote quite a few UIMA readers for specific XML formats in the past.
>>>> 
>>>> Hope that helps,
>>>> 
>>>> Erik
>>>> 
>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com>
>>>> wrote:
>>>>> 
>>>>> This is probably a very naive question, but I can't seem to find
>> anything
>>>>> about this. I currently have a lot of XML files (clinical trial
>>>>> descriptions). My current workflow is to run a preprocessor that parses
>>>> the
>>>>> XML and generates text files in a simple format. I then run these files
>>>> in
>>>>> a UIMA pipeline, using FileCollectionReader to load the text files,
>> RUTA
>>>> to
>>>>> parse the simple format, the Metamap annotator to do some UMLS
>>>> annotations,
>>>>> and finally I have a writer that generates RDF triples from the UMIA
>>>>> annotations and loads the triples into a database. This has worked but
>> is
>>>>> clunky, especially the preprocessing. I feel like there has to be a
>>>> better
>>>>> way. Is there any support for reading XML files  or do I need to write
>> my
>>>>> own CollectionReader? Are there any other tools within UIMA for
>> handling
>>>>> XML text?
>>>>> 
>>>>> thanks,
>>>>> Bonnie MacKellar
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message