ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ephi <eph...@gmail.com>
Subject Re: Dependency Parser model data
Date Tue, 24 Mar 2015 11:21:43 GMT
Update:

I tried loading the descriptor file via the menu File -> Open CPE Descriptor

This throws the following exception due to the fact that the
file ClearTrainerPosLemAggregate.xml is missing. From searching the
internet it seems that this file had been included in cTAKES 2.0  but is
not existent in the latest cTAKES.

C:\apache-ctakes-3.2.1>java -cp
"C:\apache-ctakes-3.2.1/desc/;C:\apache-ctakes-3.2.1/resources/;C:\apache-ctakes-3.2.1/lib/*"
-Dlog4j.configuration=file:/C:\apache-ctakes-3.2.1/con
fig/log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame
Error loading CPE Descriptor
C:\apache-ctakes-3.2.1\desc\ctakes-dependency-parser\desc\collection_processing_engine\ClearTrainerPosLemTestCPE.xml
java.io.FileNotFoundException:
C:\apache-ctakes-3.2.1\desc\ctakes-dependency-parser\desc\analysis_engine\ClearTrainerPosLemAggregate.xml
(The system cannot find the file specified)

        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(Unknown Source)
        at java.io.FileInputStream.<init>(Unknown Source)
        at sun.net.www.protocol.file.FileURLConnection.connect(Unknown
Source)
        at
sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown Source)
        at
org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:120)
        at
org.apache.uima.tools.cpm.CpmPanel.openCpeDescriptor(CpmPanel.java:1789)
        at
org.apache.uima.tools.cpm.CpmPanel.readPreferences(CpmPanel.java:538)
        at org.apache.uima.tools.cpm.CpmPanel.<init>(CpmPanel.java:419)
        at org.apache.uima.tools.cpm.CpmFrame.<init>(CpmFrame.java:94)
        at org.apache.uima.tools.cpm.CpmFrame.initGUI(CpmFrame.java:178)
        at org.apache.uima.tools.cpm.CpmFrame.access$000(CpmFrame.java:49)
        at org.apache.uima.tools.cpm.CpmFrame$1.run(CpmFrame.java:168)
        at java.awt.event.InvocationEvent.dispatch(Unknown Source)
        at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
        at java.awt.EventQueue.access$400(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown
Source)
        at java.awt.EventQueue.dispatchEvent(Unknown Source)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown
Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown
Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.run(Unknown Source)


On Tue, Mar 24, 2015 at 11:49 AM, Ephi <ephi79@gmail.com> wrote:

> Thanks!
>
> ***** 1 ******
> Regarding the documentation - the documentation for cTAKES 3.2 [1] links
> to the Dependency Parser documentation for 3.0 [2], it doesn't seem to have
> an updated documentation for this component.
>
> In the page from 3.0 it says simply that clinques.mod is the main
> ClearParser model packaged with cTAKES v1.1 and that it is trained on a
> corpus of 1600 clinical questions.
>
> ***** 2 ******
> Regarding self training of the models - I tried following the
> documentation but didn't succeed. The documentation [2] states the
> following:
>
> 1. Download and install the C++ version of liblinear from National Taiwan
> University; this requires much less memory than the default Java version.
> 2.Train a model
> To create a model using cTAKES POS tags and lemmas with Eclipse:
> 1. Create a <your-data>.min file from <your-data>.dep (see the section
> called "Conversion between formats")
> 2. Use the UIMA_CPE_GUI---dependency parser launch.
> 3. Load desc/collection_processing_engine/ClearTrainerPosLemTestCPE.xml
> 4. Put your filename under "Dependency File"
> 5. Make sure "Training Mode" is checked
> 6. Rename the "Dependency Model File" and "Lexicon Directory" according to
> what you want.
> 7. Make sure "Trainer Path" is a valid relative path from
> >cTAKES_HOME>/dependency parser to a vaid liblinear binary train file.
>
>
> Regarding step 2 - cTAKES 3.2 doesn't seem to have the UIMA_CPE_GUI, there
> is only bin/runCPE.bat. I tried running this.
>
> Regarding step 3 -
> When I tried to load
> desc\ctakes-dependency-parser\desc\collection_processing_engine\ClearTrainerPosLemTestCPE.xml
> I got an error (snapshot attached)
>
> Any ideas?
>
> Thanks, Ephi
>
> [1]
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+Component+Use+Guide
> [2]
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Dependency+Parser+and+Semantic+Role+Labeler
>
> On Mon, Mar 16, 2015 at 6:46 AM, Pei Chen <chenpei@apache.org> wrote:
>
>> Ephi,
>> The ClearNLP models in the current cTAKES releases (since 3.1.0 [1])
>> should
>> contain much more.  They should contain at least MiPACQ and SHARP training
>> data.  Could you point us to the documentation so we can update it?  I
>> believe the break down was:
>>
>>
>>    - Clinical questions: 1,600 sentences, 30,138 tokens.
>>    - Medpedia articles: 2,796 sentences, 49,922 tokens.
>>    - MiPACQ clinical notes: 8,040 sentences, 107,663 tokens.
>>    - MiPACQ pathological notes: 1,225 sentences, 21,581 tokens.
>>    - Seattle group health clinical notes: 5,020 sentences, 61,124 tokens.
>>    - Seattle group health pathological notes: 2,294 sentences, 34,384
>>    tokens.
>>    - SHARP clinical notes: 6,787 sentences, 94,205 tokens.
>>    - SHARP stratified: 4,316 sentences, 43,037 tokens.
>>    - SHARP stratified SGH: 4,963 sentences, 49,081 tokens.
>>    - TEMPREL clinical notes: 19,775 sentences, 266,979 tokens.
>>    - TEMPREL pathological notes: 4,335 sentences, 78,829 tokens.
>>
>> There are some discussions on appending/augmenting the existing
>> annotated/training data[2].  I think the short answer is that there is
>> currently no easy way short of having to sign DUA's from every single
>> source institution.
>>
>> [1] http://svn.apache.org/r1465043
>> [2]
>>
>> http://mail-archives.apache.org/mod_mbox/ctakes-dev/201412.mbox/%3CE5A9FA5ABBF1CA4085D4F0794852A51E2424117D@CHEXMBX3A.CHBOSTON.ORG%3E
>>
>>
>> On Sun, Mar 15, 2015 at 11:58 AM, Ephi <ephi79@gmail.com> wrote:
>>
>> > Hi -
>> >
>> > From the documentation, the data used to train the dep parser in cTAKES
>> > seems to be 1600 clinical questions (from the Mayo clinic?).
>> >
>> > Is there a way to retrieve this data in order to retrain the model
>> (while
>> > adding on additional data) ?
>> >
>> > Thanks!
>> > Ephi
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message