uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: XML files as input to UIMA?
Date Wed, 27 Feb 2019 08:29:27 GMT
Dear Bonnie,

oh, sorry about that. I tend to forget this: We store all our types in the special jcore-types
project. You need to build this project once. You can just use a maven package command (mvn
package) because we build the types automatically through a Maven plugin.
After that, the types should be available to all project.

Note that this is not necessary to use the Maven artifacts that we have already uploaded to
Maven central. Those refer to the jcore-types maven artifact which includes all built types.

You should only have this issue due to Maven workspace resolution.

Hope this helps,


> On 26. Feb 2019, at 20:49, Bonnie MacKellar <bkmackellar@gmail.com> wrote:
> HI,
> Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I can't
> get jcore-ct-reader to build or generate types, although many of the other
> components do build. I am running an old version of UIMA - 2.81.1. Does
> this require a later version?
> thanks
> Bonnie MacKellar
> On Mon, Feb 25, 2019 at 8:37 AM Erik Fäßler <erik.faessler@uni-jena.de>
> wrote:
>> Dear Bonnie,
>> please check out
>> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader <
>> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>.
>> Please let me know if you have any questions or if you already decided to
>> go with one of the other approaches that have been proposed in the meantime
>> or something entirely different.
>> Best,
>> Erik
>>> On 22. Feb 2019, at 13:17, Bonnie MacKellar <bkmackellar@gmail.com>
>> wrote:
>>> Thanks so much!
>>> Bonnie MacKellar
>>> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de>
>>> wrote:
>>>> Hey,
>>>> just wanted to say that I didn’t come around to make the component
>>>> available yet, will do first thing next week!
>>>> Best,
>>>> Erik
>>>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>> Yes, we are using that format. I have a parser that I wrote, but it
>> isn't
>>>>> integrated into UIMA. It runs separately and loads the full clinical
>>>> trial
>>>>> data into a triplestore (Stardog). I would be interested in your system
>>>>> since I am not really familiar with how to write file readers in the
>>>>> framework. Perhaps I can merge my parser into it and end up with just
>> the
>>>>> right thing. If you can make it available, I would definitely be
>>>>> interested.  I will take a look at the other links as well.  Thanks!!
>>>>> Bonnie MacKellar
>>>>> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faessler@uni-jena.de
>>>>> wrote:
>>>>>> Dear Bonnie,
>>>>>> are you talking about the clinical trial XML format used by
>>>>>> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
>>>>>> If so, I did create a UIMA reader for these data. Its not perfect
>>>>>> perhaps enough for your purposes and also you might want to enhance
>> it.
>>>>>> Please let me know if you would be interested in that, I did not
>>>>>> around to make it publicly available yet but could do so quickly.
>>>>>> To answer the general question to the best of my knowledge:
>>>>>> There is no such thing as a general XML reader built-in into the
>>>>>> framework. For all non-trivial formats, a specific reader is
>> necessary.
>>>>>> This also holds true with regard to the employed type system.
>>>>>> That being said, there are UIMA readers that try to serve as a general
>>>> XML
>>>>>> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
>>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
>>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
>>> ).
>>>>>> However, in my experience XML inputs come in a lot of different forms
>>>>>> which might often not be suitable to a generic approach which is
why I
>>>>>> wrote quite a few UIMA readers for specific XML formats in the past.
>>>>>> Hope that helps,
>>>>>> Erik
>>>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com>
>>>>>> wrote:
>>>>>>> This is probably a very naive question, but I can't seem to find
>>>> anything
>>>>>>> about this. I currently have a lot of XML files (clinical trial
>>>>>>> descriptions). My current workflow is to run a preprocessor that
>> parses
>>>>>> the
>>>>>>> XML and generates text files in a simple format. I then run these
>> files
>>>>>> in
>>>>>>> a UIMA pipeline, using FileCollectionReader to load the text
>>>> RUTA
>>>>>> to
>>>>>>> parse the simple format, the Metamap annotator to do some UMLS
>>>>>> annotations,
>>>>>>> and finally I have a writer that generates RDF triples from the
>>>>>>> annotations and loads the triples into a database. This has worked
>> but
>>>> is
>>>>>>> clunky, especially the preprocessing. I feel like there has to
be a
>>>>>> better
>>>>>>> way. Is there any support for reading XML files  or do I need
>> write
>>>> my
>>>>>>> own CollectionReader? Are there any other tools within UIMA for
>>>> handling
>>>>>>> XML text?
>>>>>>> thanks,
>>>>>>> Bonnie MacKellar

View raw message