Return-Path: X-Original-To: apmail-ctakes-user-archive@www.apache.org Delivered-To: apmail-ctakes-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2463ECD69 for ; Fri, 14 Mar 2014 18:22:59 +0000 (UTC) Received: (qmail 70010 invoked by uid 500); 14 Mar 2014 18:22:56 -0000 Delivered-To: apmail-ctakes-user-archive@ctakes.apache.org Received: (qmail 69838 invoked by uid 500); 14 Mar 2014 18:22:55 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 69648 invoked by uid 99); 14 Mar 2014 18:22:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 18:22:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Timothy.Miller@childrens.harvard.edu designates 134.174.13.92 as permitted sender) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 18:22:47 +0000 Received: from pps.filterd (mailsmtp2.childrenshospital.org [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.14.5/8.14.5) with SMTP id s2EIJ24S023037 for ; Fri, 14 Mar 2014 14:22:25 -0400 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp2.childrenshospital.org with ESMTP id 1jkm96wdeb-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 14 Mar 2014 14:22:25 -0400 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.14.5/8.14.5) with SMTP id s2EIKcaQ029048 for ; Fri, 14 Mar 2014 14:22:24 -0400 Received: from chexhubcasbdc2.chboston.org (chexhubcasbdc2.chboston.org [10.20.18.93]) by smtpndc2.chboston.org with ESMTP id 1jkm9ka4f1-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Fri, 14 Mar 2014 14:22:24 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCASBDC2.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Fri, 14 Mar 2014 14:22:24 -0400 From: "Miller, Timothy" To: "user@ctakes.apache.org" Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format? Thread-Topic: Parser for output of cTAKES coreference module to i2b2 scoring code format? Thread-Index: AQHPP6Q4duqOx2FxN0G/Qx71N4XVHprg3IQNgAAGBeWAAAKdiQ== Date: Fri, 14 Mar 2014 18:22:23 +0000 Message-ID: References: ,, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.218] Content-Type: multipart/alternative; boundary="_000_E084D8EFE2B03A408B324458C5212E94242F4FE1CHEXMBX3ACHBOST_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87,1.0.14,0.0.0000 definitions=2014-03-14_07:2014-03-14,2014-03-14,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87,1.0.14,0.0.0000 definitions=2014-03-14_07:2014-03-14,2014-03-14,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1305240000 definitions=main-1403140100 X-Virus-Checked: Checked by ClamAV on apache.org --_000_E084D8EFE2B03A408B324458C5212E94242F4FE1CHEXMBX3ACHBOST_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable And... there are some helper methods: private HashMap mapTokensToIndices(JCas jc) { HashMap map =3D new HashMap(); int i =3D 1; FSIterator iter =3D jc.getAnnotationIndex(BaseToken.type).iterator(= ); while(iter.hasNext()){ BaseToken tok =3D (BaseToken) iter.next(); map.put(tok.getBegin(), i); map.put(tok.getEnd(), i); i++; } return map; } private String getSpanString(Markable mark, HashMap tok2ind) { Integer tok1 =3D tok2ind.get(mark.getBegin()); Integer tok2 =3D tok2ind.get(mark.getEnd()); return "1:" + tok1 + " 1:" + tok2; } Hopefully that is it! ________________________________ From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu] Sent: Friday, March 14, 2014 2:19 PM To: user@ctakes.apache.org Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring= code format? Kev -- I think I found it. It looks like it was never checked in, as it was= part of a separate eval module that used gpl'd code and couldn't be releas= ed with ctakes. It was also commented out for some reason. I'll just paste = the relevant section here, and hopefully you can use it or at least save yo= u some time. If you write a UIMA consumer or other tool that you think woul= d help others and you are willing to share we would gladly incorporate it i= nto ctakes. // if(i2b2){ // // write system chain file: // try { // PrintWriter writer =3D new PrintWri= ter(i2b2Path + "/sysChain/" + docName + ".chain"); // HashMap tok2ind = =3D mapTokensToIndices(jc); // FSIterator iter =3D jc.getJFSIndexR= epository().getAllIndexedFS(CoreferenceChain.type); // while(iter.hasNext()){ // CoreferenceChain chain =3D = (CoreferenceChain) iter.next(); // FSList members =3D chain.ge= tMembers(); // while(members instanceof No= nEmptyFSList){ // NonEmptyFSList node= =3D (NonEmptyFSList) members; // Markable mark =3D (= Markable) node.getHead(); // writer.print("c=3D\= ""); // writer.print(mark.g= etCoveredText()); // writer.print("\" ")= ; // writer.print(getSpa= nString(mark, tok2ind)); // writer.print("||"); // members =3D node.ge= tTail(); // } // // write the type informati= on // writer.println("t=3D\"coref= problem\""); // } // writer.close(); // } catch (FileNotFoundException e) { // // TODO Auto-generated catch block // e.printStackTrace(); // } // // } Tim ________________________________ From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu] Sent: Friday, March 14, 2014 1:52 PM To: user@ctakes.apache.org Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring= code format? Kevin, I think I did write something like that at some point. I just spent = the last 10 minutes looking for it and can't find it. I will poke around a = bit more and let you know if I find anything. Tim ________________________________ From: Kevin B. Cohen [kevin.cohen@gmail.com] Sent: Friday, March 14, 2014 12:41 PM To: user@ctakes.apache.org; Natural language processing for biology Subject: Parser for output of cTAKES coreference module to i2b2 scoring cod= e format? Hi, I'm sitting here writing a little parser to convert the output of the cTAKE= S coreference resolution module into the format of the scoring code from th= e i2b2 coreference resolution task, and it occurred to me that multiple peo= ple have probably done that already. Would anyone be willing to share code= for that? Kev -- Kevin Bretonnel Cohen, PhD Biomedical Text Mining Group Lead, Computational Bioscience Program, U. Colorado School of Medicine 303-916-2417303-916-2417 http://compbio.ucdenver.edu/Hunter_lab/Cohen Call Send SMS Add to Skype You'll need Skype CreditFree via Skype --_000_E084D8EFE2B03A408B324458C5212E94242F4FE1CHEXMBX3ACHBOST_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
And... there are some helper methods:


    private HashMap<Integer, Integer> mapTokensToIndic= es(JCas jc) {
        HashMap<Integer,Integer> map = =3D new HashMap<Integer,Integer>();
       
        int i =3D 1;
        FSIterator iter =3D jc.getAnnotationI= ndex(BaseToken.type).iterator();
        while(iter.hasNext()){
            BaseToken tok =3D = (BaseToken) iter.next();
            map.put(tok.getBeg= in(), i);
            map.put(tok.getEnd= (), i);
            i++;
        }
        return map;
    }


    private String getSpanString(Markable mark,
            HashMap<Integer= , Integer> tok2ind) {
        Integer tok1 =3D tok2ind.get(mark.get= Begin());
        Integer tok2 =3D tok2ind.get(mark.get= End());
        return "1:" + tok1 += ; " 1:" + tok2;
    }


Hopefully that is it!

From: Miller, Timothy [Timothy.Miller@chi= ldrens.harvard.edu]
Sent: Friday, March 14, 2014 2:19 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 = scoring code format?

Kev -- I think I found it. It looks like it was never checked in, as i= t was part of a separate eval module that used gpl'd code and couldn't be r= eleased with ctakes. It was also commented out for some reason. I'll just paste the relevant section here, and hopefu= lly you can use it or at least save you some time. If you write a UIMA cons= umer or other tool that you think would help others and you are willing to = share we would gladly incorporate it into ctakes.

//            &= nbsp;         if(i2b2){
//            &= nbsp;           &nbs= p;     // write system chain file:
//            &= nbsp;           &nbs= p;     try {
//            &= nbsp;           &nbs= p;             = PrintWriter writer =3D new PrintWriter(i2b2Path + "/sysChain/"= ; + docName + ".chain");
//            &= nbsp;           &nbs= p;             = HashMap<Integer,Integer> tok2ind =3D mapTokensToIndices(jc);
//            &= nbsp;           &nbs= p;             = FSIterator iter =3D jc.getJFSIndexRepository().getAllIndexedFS(CoreferenceC= hain.type);
//            &= nbsp;           &nbs= p;             = while(iter.hasNext()){
//            &= nbsp;           &nbs= p;            &= nbsp;        CoreferenceChain chain =3D = (CoreferenceChain) iter.next();
//            &= nbsp;           &nbs= p;            &= nbsp;        FSList members =3D chain.ge= tMembers();
//            &= nbsp;           &nbs= p;            &= nbsp;        while(members instanceof No= nEmptyFSList){
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    NonEmptyFSList node =3D (NonEmptyFSList) members;
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    Markable mark =3D (Markable) node.getHead();
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    writer.print("c=3D\"");
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    writer.print(mark.getCoveredText());
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    writer.print("\" ");
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    writer.print(getSpanString(mark, tok2ind));
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    writer.print("||");
//            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    members =3D node.getTail();
//            &= nbsp;           &nbs= p;            &= nbsp;        }
//            &= nbsp;           &nbs= p;            &= nbsp;        // write the type informati= on
//            &= nbsp;           &nbs= p;            &= nbsp;        writer.println("t=3D\&= quot;coref problem\"");
//            &= nbsp;           &nbs= p;             = }
//            &= nbsp;           &nbs= p;             = writer.close();
//            &= nbsp;           &nbs= p;     } catch (FileNotFoundException e) {
//            &= nbsp;           &nbs= p;             = // TODO Auto-generated catch block
//            &= nbsp;           &nbs= p;             = e.printStackTrace();
//            &= nbsp;           &nbs= p;     }
//            &= nbsp;           &nbs= p;    
//            &= nbsp;         }


Tim



From: Miller, Timothy [Timothy.Miller@child= rens.harvard.edu]
Sent: Friday, March 14, 2014 1:52 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 = scoring code format?

Kevin, I think I did write something like that at some point. I just s= pent the last 10 minutes looking for it and can't find it. I will poke arou= nd a bit more and let you know if I find anything.
Tim


From: Kevin B. Cohen [kevin.cohen@gmail.com= ]
Sent: Friday, March 14, 2014 12:41 PM
To: user@ctakes.apache.org; Natural language processing for biology<= br> Subject: Parser for output of cTAKES coreference module to i2b2 scor= ing code format?

Hi,

I'm sitting here writing a little parser to convert the output of the cTAKE= S coreference resolution module into the format of the scoring code from th= e i2b2 coreference resolution task, and it occurred to me that multiple peo= ple have probably done that already.  Would anyone be willing to share code for that?

Kev

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
= http://compbio.ucdenver.edu/Hunter_lab/Cohen



You'll need Skype CreditFree via Skype
--_000_E084D8EFE2B03A408B324458C5212E94242F4FE1CHEXMBX3ACHBOST_--