incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Joseph Masanz (JIRA)" <>
Subject [jira] [Created] (CTAKES-158) DateAnnotation bug when two dates directly adjacent
Date Mon, 11 Feb 2013 17:29:13 GMT
James Joseph Masanz created CTAKES-158:

             Summary: DateAnnotation bug when two dates directly adjacent
                 Key: CTAKES-158
             Project: cTAKES
          Issue Type: Bug
          Components: ctakes-context-tokenizer
    Affects Versions: 3.0-incubating, 3.1-incubating
            Reporter: James Joseph Masanz

from email from Shady AbdelAziz February 11, 2013 on ctakes-dev@

  While working with DateAnnotation and add some new state machines in the, i
found a minor bug regarding the starting and ending index of DateAnnotation.

Consider the small example

"October 2003 November 2010 cTAKES is the best framework".

The result is supposed to be "October 2003" and "November 2010", but cTAKES detects "October
2003" and "October 2003 November 2010".

This is because the FSM detects the first one and as it has no record in the "tokenStartMap"
so it assumes the starting index as "0". Then it starts detecting the second date but also
there is no record for it in the map yet(as there is a value in the map only when the state
is a starting state, in other words a condition that is not satisfying any state), so it assumes
the starting index is "0".

Thats why for example if there is an intermediate token between the two dates, it will work

The solution is simply to put a record in the map before resetting the FSM.
so this line should be put "tokenStartMap.put(fsm, new Integer(i));".

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message