uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: XML files as input to UIMA?
Date Wed, 27 Feb 2019 13:23:19 GMT
Hi,

I am very sorry that this isn’t just working right away and costing you time.
I had expected that an mvn package would be enough since the Eclipse Workspace resolution
would then take over (just to be sure: You did update dependencies on your projects after
running mvn package on jcore-types?).
You could also try to do mvn install instead so that the types get installed in your local
Maven repository. You said that something already got installed, so perhaps you did do this
already.

Please check that the jcore-document-structure-clinicaltrial-types.xml file is present in
your jcore-types project and that the de.julielab.jcore.types package under the src/main/java
package contains the ct package and that the type classes are actually there.
If so, the rest should be a Maven dependency resolution issue.

Thanks for letting me know what your goals are. I used this reader for indexing ClinicalTrials
into a search index. This is why the actual text representation was not that important for
me.
What I did, as you can see, is to annotate all parts with the exact annotation types like
inclusion/exclusion etc. This way, I can always extract the exact information I need.

Best,

Erik

> On 27. Feb 2019, at 14:11, Bonnie MacKellar <bkmackellar@gmail.com> wrote:
> 
> Hi,
> 
> Thanks. I had actually figured that out yesterday after visiting the
> project website and reading carefully. The problem, though, is that
> jcore-ct-reader still does not build. It can't find the clinical trial
> types even after they have been generated. The mvn package step correctly
> generates the jar file, but places it in a target folder (don't have the
> exact path because I am on another computer) which isn't seen by
> jcore-ct-reader.  There is another jar file generated by jcore-types, which
> ends up in my maven repository, and I *think* that is the one
> jcore-ct-reader picks up. But it doesn't contain the clinical trial types
> for some reason. So I just get a lot of error messages when I try to build
> jcore-ct-reader saying it can't find those classes.
> 
> The project looks interesting. My parser gets more fields from the trials,
> and we handle inclusion/exclusion text differently because part of our
> pipeline is to parse those sentences and annotate them in varius ways. What
> we had been doing was to use the clinical trial XML parser that I had built
> to 1) insert rdf triples into a triplestore,   2) generate a text
> representation of the inclusion/exclusion constraints that is fed through
> the UIMA pipeline. The output of that process is also placed in the
> triplestore. One of the things I am looking at doing is unifying this
> better so that it is all one process. Looking at your reader, I can see how
> it all could work. Nice!
> 
> Bonnie MacKellar
> 
> On Wed, Feb 27, 2019 at 3:29 AM Erik Fäßler <erik.faessler@uni-jena.de>
> wrote:
> 
>> Dear Bonnie,
>> 
>> oh, sorry about that. I tend to forget this: We store all our types in the
>> special jcore-types project. You need to build this project once. You can
>> just use a maven package command (mvn package) because we build the types
>> automatically through a Maven plugin.
>> After that, the types should be available to all project.
>> 
>> Note that this is not necessary to use the Maven artifacts that we have
>> already uploaded to Maven central. Those refer to the jcore-types maven
>> artifact which includes all built types.
>> 
>> You should only have this issue due to Maven workspace resolution.
>> 
>> Hope this helps,
>> 
>> Erik
>> 
>>> On 26. Feb 2019, at 20:49, Bonnie MacKellar <bkmackellar@gmail.com>
>> wrote:
>>> 
>>> HI,
>>> 
>>> Thanks so much. I forked it and loaded into Eclipse. Unfortunately, I
>> can't
>>> get jcore-ct-reader to build or generate types, although many of the
>> other
>>> components do build. I am running an old version of UIMA - 2.81.1. Does
>>> this require a later version?
>>> 
>>> thanks
>>> Bonnie MacKellar
>>> 
>>> On Mon, Feb 25, 2019 at 8:37 AM Erik Fäßler <erik.faessler@uni-jena.de>
>>> wrote:
>>> 
>>>> Dear Bonnie,
>>>> 
>>>> please check out
>>>> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader <
>>>> https://github.com/JULIELab/jcore-base/tree/v2.4/jcore-ct-reader>.
>>>> 
>>>> Please let me know if you have any questions or if you already decided
>> to
>>>> go with one of the other approaches that have been proposed in the
>> meantime
>>>> or something entirely different.
>>>> 
>>>> Best,
>>>> 
>>>> Erik
>>>> 
>>>>> On 22. Feb 2019, at 13:17, Bonnie MacKellar <bkmackellar@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Thanks so much!
>>>>> 
>>>>> Bonnie MacKellar
>>>>> 
>>>>> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler <erik.faessler@uni-jena.de
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hey,
>>>>>> 
>>>>>> just wanted to say that I didn’t come around to make the component
>>>>>> available yet, will do first thing next week!
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Erik
>>>>>> 
>>>>>>> On 20. Feb 2019, at 19:47, Bonnie MacKellar <bkmackellar@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> Yes, we are using that format. I have a parser that I wrote,
but it
>>>> isn't
>>>>>>> integrated into UIMA. It runs separately and loads the full clinical
>>>>>> trial
>>>>>>> data into a triplestore (Stardog). I would be interested in your
>> system
>>>>>>> since I am not really familiar with how to write file readers
in the
>>>> UMIA
>>>>>>> framework. Perhaps I can merge my parser into it and end up with
just
>>>> the
>>>>>>> right thing. If you can make it available, I would definitely
be
>>>>>>> interested.  I will take a look at the other links as well. 
Thanks!!
>>>>>>> 
>>>>>>> Bonnie MacKellar
>>>>>>> 
>>>>>>> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <
>> erik.faessler@uni-jena.de
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Dear Bonnie,
>>>>>>>> 
>>>>>>>> are you talking about the clinical trial XML format used
by
>>>>>>>> ClinicalTrials. <http://clinicaltrials.org/>gov by
any chance?
>>>>>>>> If so, I did create a UIMA reader for these data. Its not
perfect
>> but
>>>>>>>> perhaps enough for your purposes and also you might want
to enhance
>>>> it.
>>>>>>>> Please let me know if you would be interested in that, I
did not get
>>>>>>>> around to make it publicly available yet but could do so
quickly.
>>>>>>>> 
>>>>>>>> To answer the general question to the best of my knowledge:
>>>>>>>> There is no such thing as a general XML reader built-in into
the
>> UIMA
>>>>>>>> framework. For all non-trivial formats, a specific reader
is
>>>> necessary.
>>>>>>>> This also holds true with regard to the employed type system.
>>>>>>>> That being said, there are UIMA readers that try to serve
as a
>> general
>>>>>> XML
>>>>>>>> reading facility, e.g. the “XML Reader” from our lab
(JULIELab,
>>>>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
>> <
>>>>>>>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
>>>>> ).
>>>>>>>> However, in my experience XML inputs come in a lot of different
>> forms
>>>>>>>> which might often not be suitable to a generic approach which
is
>> why I
>>>>>>>> wrote quite a few UIMA readers for specific XML formats in
the past.
>>>>>>>> 
>>>>>>>> Hope that helps,
>>>>>>>> 
>>>>>>>> Erik
>>>>>>>> 
>>>>>>>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackellar@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> This is probably a very naive question, but I can't seem
to find
>>>>>> anything
>>>>>>>>> about this. I currently have a lot of XML files (clinical
trial
>>>>>>>>> descriptions). My current workflow is to run a preprocessor
that
>>>> parses
>>>>>>>> the
>>>>>>>>> XML and generates text files in a simple format. I then
run these
>>>> files
>>>>>>>> in
>>>>>>>>> a UIMA pipeline, using FileCollectionReader to load the
text files,
>>>>>> RUTA
>>>>>>>> to
>>>>>>>>> parse the simple format, the Metamap annotator to do
some UMLS
>>>>>>>> annotations,
>>>>>>>>> and finally I have a writer that generates RDF triples
from the
>> UMIA
>>>>>>>>> annotations and loads the triples into a database. This
has worked
>>>> but
>>>>>> is
>>>>>>>>> clunky, especially the preprocessing. I feel like there
has to be a
>>>>>>>> better
>>>>>>>>> way. Is there any support for reading XML files  or do
I need to
>>>> write
>>>>>> my
>>>>>>>>> own CollectionReader? Are there any other tools within
UIMA for
>>>>>> handling
>>>>>>>>> XML text?
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> Bonnie MacKellar
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message