ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?
Date Fri, 14 Mar 2014 18:22:23 GMT
And... there are some helper methods:


    private HashMap<Integer, Integer> mapTokensToIndices(JCas jc) {
        HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();

        int i = 1;
        FSIterator iter = jc.getAnnotationIndex(BaseToken.type).iterator();
        while(iter.hasNext()){
            BaseToken tok = (BaseToken) iter.next();
            map.put(tok.getBegin(), i);
            map.put(tok.getEnd(), i);
            i++;
        }
        return map;
    }


    private String getSpanString(Markable mark,
            HashMap<Integer, Integer> tok2ind) {
        Integer tok1 = tok2ind.get(mark.getBegin());
        Integer tok2 = tok2ind.get(mark.getEnd());
        return "1:" + tok1 + " 1:" + tok2;
    }


Hopefully that is it!

________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Friday, March 14, 2014 2:19 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Kev -- I think I found it. It looks like it was never checked in, as it was part of a separate
eval module that used gpl'd code and couldn't be released with ctakes. It was also commented
out for some reason. I'll just paste the relevant section here, and hopefully you can use
it or at least save you some time. If you write a UIMA consumer or other tool that you think
would help others and you are willing to share we would gladly incorporate it into ctakes.

//                      if(i2b2){
//                              // write system chain file:
//                              try {
//                                      PrintWriter writer = new PrintWriter(i2b2Path + "/sysChain/"
+ docName + ".chain");
//                                      HashMap<Integer,Integer> tok2ind = mapTokensToIndices(jc);
//                                      FSIterator iter = jc.getJFSIndexRepository().getAllIndexedFS(CoreferenceChain.type);
//                                      while(iter.hasNext()){
//                                              CoreferenceChain chain = (CoreferenceChain)
iter.next();
//                                              FSList members = chain.getMembers();
//                                              while(members instanceof NonEmptyFSList){
//                                                      NonEmptyFSList node = (NonEmptyFSList)
members;
//                                                      Markable mark = (Markable) node.getHead();
//                                                      writer.print("c=\"");
//                                                      writer.print(mark.getCoveredText());
//                                                      writer.print("\" ");
//                                                      writer.print(getSpanString(mark, tok2ind));
//                                                      writer.print("||");
//                                                      members = node.getTail();
//                                              }
//                                              // write the type information
//                                              writer.println("t=\"coref problem\"");
//                                      }
//                                      writer.close();
//                              } catch (FileNotFoundException e) {
//                                      // TODO Auto-generated catch block
//                                      e.printStackTrace();
//                              }
//
//                      }


Tim


________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Friday, March 14, 2014 1:52 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Kevin, I think I did write something like that at some point. I just spent the last 10 minutes
looking for it and can't find it. I will poke around a bit more and let you know if I find
anything.
Tim

________________________________
From: Kevin B. Cohen [kevin.cohen@gmail.com]
Sent: Friday, March 14, 2014 12:41 PM
To: user@ctakes.apache.org; Natural language processing for biology
Subject: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Hi,

I'm sitting here writing a little parser to convert the output of the cTAKES coreference resolution
module into the format of the scoring code from the i2b2 coreference resolution task, and
it occurred to me that multiple people have probably done that already.  Would anyone be willing
to share code for that?

Kev

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen



Call
Send SMS
Add to Skype
You'll need Skype CreditFree via Skype

Mime
View raw message