Return-Path: X-Original-To: apmail-ctakes-user-archive@www.apache.org Delivered-To: apmail-ctakes-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F9D711E8F for ; Mon, 21 Jul 2014 18:53:03 +0000 (UTC) Received: (qmail 94024 invoked by uid 500); 21 Jul 2014 18:53:03 -0000 Delivered-To: apmail-ctakes-user-archive@ctakes.apache.org Received: (qmail 94001 invoked by uid 500); 21 Jul 2014 18:53:03 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 93991 invoked by uid 99); 21 Jul 2014 18:53:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2014 18:53:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Timothy.Miller@childrens.harvard.edu designates 134.174.13.92 as permitted sender) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2014 18:52:58 +0000 Received: from pps.filterd (mailsmtp2.childrenshospital.org [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.14.7/8.14.7) with SMTP id s6LIoJUM002813 for ; Mon, 21 Jul 2014 14:52:27 -0400 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp2.childrenshospital.org with ESMTP id 1n0qksmwkd-1 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 21 Jul 2014 14:52:26 -0400 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.14.7/8.14.7) with SMTP id s6LImgUJ010488 for ; Mon, 21 Jul 2014 14:52:26 -0400 Received: from chexhubcas1.chboston.org (internal-ndc-nat-v1260.tch.harvard.edu [10.20.50.4]) by smtpndc2.chboston.org with ESMTP id 1n5fqavshq-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 21 Jul 2014 14:52:26 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCAS1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Mon, 21 Jul 2014 14:52:25 -0400 From: "Miller, Timothy" To: "user@ctakes.apache.org" Subject: Re: Input file format for CPE? Thread-Topic: Input file format for CPE? Thread-Index: AQHPpQl7IeEzX3+470aOm8ykrkJJPw== Date: Mon, 21 Jul 2014 18:52:25 +0000 Message-ID: References: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.218] Content-Type: multipart/alternative; boundary="_000_E084D8EFE2B03A408B324458C5212E942454674ECHEXMBX3ACHBOST_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52,1.0.14,0.0.0000 definitions=2014-07-20_03:2014-07-18,2014-07-20,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52,1.0.14,0.0.0000 definitions=2014-07-20_03:2014-07-18,2014-07-20,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1407210213 X-Virus-Checked: Checked by ClamAV on apache.org --_000_E084D8EFE2B03A408B324458C5212E942454674ECHEXMBX3ACHBOST_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable You may need to modify test_plaintext.xml to use the UMLS-based pipeline if= you haven't already. I think the line: needs to be changed to use: AggregatePlaintextUMLSProcessor.xml I believe you can also make that change in the CPE GUI. Tim On 07/21/2014 02:43 PM, Natalia Connolly wrote: Thanks Tim. This worked in the sense that it did not crash; however, the o= utput does not seem to have any actual annotations of diagnoses, medication= s, etc. The input text contains a number of such concepts that had indeed = been flagged by CVD; but when I grep for "concept" or "medfacts" or "cui" i= n the CPE output there is nothing there. Would you have any suggestions fo= r how to "synchronize" the outputs of CVD and CPE? Both scripts contain th= e -Dctakes.umlsuser/umlspw options, so both should have access to UMLS. Thank you, Natalia On Mon, Jul 21, 2014 at 1:36 PM, Miller, Timothy > wrote: It looks to me like you want test_plaintext.xml rather than test1.xml. test= 1.xml seems to expect CDA-formatted input while test_plaintext.xml can read= text files like you have. Tim On 07/21/2014 01:30 PM, Natalia Connolly wrote: Hello, I am new to cTAKES. I am using cTAKES 3.1. I've been able to run the v= isual debugger without any trouble but now I am stuck on running the CPE ve= rsion, which is what I will really need as I have a large number of clinica= l documents to process. I loaded test1.xml as the descriptor, and made sure both the input and = the output directories exist. My single input file in the input directory = is just plain text, similar to the "Dr. Nutritious" example. However, I a= m getting the following error: org.apache.uima.analysis_engine.AnalysisEngineProcessException CausedBy: org,xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; Co= ntent is now allowed in Prolog. Does this mean that the input file has to be in xml format? If so, how = do I convert plain text into the format that cTAKES expects? Thank you. Natalia Connolly --_000_E084D8EFE2B03A408B324458C5212E942454674ECHEXMBX3ACHBOST_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable You may need to modify test_plaintext.xml to use the UMLS-based pipeline if= you haven't already. I think the line:
            &nb= sp;   <import location=3D"../analysis_engine/AggregatePla= intextProcessor.xml"/>

needs to be changed to use:

AggregatePlaintextUMLSProcessor.xml

I believe you can also make that change in the CPE GUI.

Tim

On 07/21/2014 02:43 PM, Natalia Connolly wro= te:
Thanks Tim.  This worked in the sense that it did not= crash; however, the output does not seem to have any actual annotations of= diagnoses, medications, etc.  The input text contains a number of suc= h concepts that had indeed been flagged by CVD; but when I grep for "concept" or "medfacts" or "c= ui" in the CPE output there is nothing there.  Would you have any= suggestions for how to "synchronize" the outputs of CVD and CPE?=  Both scripts contain the -Dctakes.umlsuser/umlspw options, so both s= hould have access to UMLS.

Thank you,

Natalia



On Mon, Jul 21, 2014 at 1:36 PM, Miller, Timothy= <Timothy.Miller@childrens.harvard.edu> wrote:
It looks to me like you want test= _plaintext.xml rather than test1.xml. test1.xml seems to expect CDA-formatt= ed input while test_plaintext.xml can read text files like you have.
Tim


On 07/21/2014 01:30 PM, Natalia Connolly wrote:
Hello,

   I am new to cTAKES.  I am using cTAKES 3.1.  I'= ve been able to run the visual debugger without any trouble but now I am st= uck on running the CPE version, which is what I will really need as I have = a large number of clinical documents to process.

    I loaded test1.xml as the descriptor, and made sure both= the input and the output directories exist.  My single input file in = the input directory is just plain text, similar to the "Dr. Nutritious= " example.   However, I am getting the following error:

org.apache.uima.analysis_engine.AnalysisEngineProcessException
CausedBy: org,xml.sax.SAXParseException; lineNumber: 1; columnNumber: = 2; Content is now allowed in Prolog.

   Does this mean that the input file has to be in xml forma= t?  If so, how do I convert plain text into the format that cTAKES exp= ects?

   Thank you.

   Natalia Connolly





--_000_E084D8EFE2B03A408B324458C5212E942454674ECHEXMBX3ACHBOST_--