ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Natalia Connolly <natalia.v.conno...@gmail.com>
Subject Re: Input file format for CPE?
Date Tue, 22 Jul 2014 15:09:59 GMT
Thank you again, Tim.  This worked very nicely!

One more question on a somewhat unrelated subject, if I may: while this
setup found most clinical concepts it did not find a few that I'd think
would be fairly standard.  For example, it did not find "myeloblastoma",
which is definitely a UMLS term.   I tried using
AggregatePlaintextUMLSProcessor in conduction with
DictionaryLookupAnnotatorUMLS, but no luck - it only found "myeloblastoma"
as a WordToken.  Are there different versions of UMLS dictionaries that
could potentially be used in cTAKES?

Thank you,

Natalia



On Mon, Jul 21, 2014 at 2:52 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

>  You may need to modify test_plaintext.xml to use the UMLS-based pipeline
> if you haven't already. I think the line:
>                 <import
> location="../analysis_engine/AggregatePlaintextProcessor.xml"/>
>
> needs to be changed to use:
>
> AggregatePlaintextUMLSProcessor.xml
>
> I believe you can also make that change in the CPE GUI.
>
> Tim
>
>
> On 07/21/2014 02:43 PM, Natalia Connolly wrote:
>
> Thanks Tim.  This worked in the sense that it did not crash; however, the
> output does not seem to have any actual annotations of diagnoses,
> medications, etc.  The input text contains a number of such concepts that
> had indeed been flagged by CVD; but when I grep for "concept" or "medfacts"
> or "cui" in the CPE output there is nothing there.  Would you have any
> suggestions for how to "synchronize" the outputs of CVD and CPE?  Both
> scripts contain the -Dctakes.umlsuser/umlspw options, so both should have
> access to UMLS.
>
>  Thank you,
>
>  Natalia
>
>
>
> On Mon, Jul 21, 2014 at 1:36 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> It looks to me like you want test_plaintext.xml rather than test1.xml.
>> test1.xml seems to expect CDA-formatted input while test_plaintext.xml can
>> read text files like you have.
>> Tim
>>
>>
>> On 07/21/2014 01:30 PM, Natalia Connolly wrote:
>>
>> Hello,
>>
>>     I am new to cTAKES.  I am using cTAKES 3.1.  I've been able to run
>> the visual debugger without any trouble but now I am stuck on running the
>> CPE version, which is what I will really need as I have a large number of
>> clinical documents to process.
>>
>>      I loaded test1.xml as the descriptor, and made sure both the input
>> and the output directories exist.  My single input file in the input
>> directory is just plain text, similar to the "Dr. Nutritious" example.
>> However, I am getting the following error:
>>
>>  org.apache.uima.analysis_engine.AnalysisEngineProcessException
>> CausedBy: org,xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2;
>> Content is now allowed in Prolog.
>>
>>     Does this mean that the input file has to be in xml format?  If so,
>> how do I convert plain text into the format that cTAKES expects?
>>
>>     Thank you.
>>
>>     Natalia Connolly
>>
>>
>>
>>
>
>

Mime
View raw message