ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim Ebert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-316) "I do not see any" returns different ContextAnnotations in drugner pipeline
Date Tue, 07 Oct 2014 22:11:33 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162657#comment-14162657
] 

Kim Ebert commented on CTAKES-316:
----------------------------------

Sean said the following:


If I understand the code correctly (it could use some doc), it runs negation engines and then
if any negation exists it creates a single hit signifying negation.  Like a heavyweight Boolean.
  Unfortunately, as you know, because Collection "s"  is a Set and it throws in the first
token to come along ...  

An isolated change here would probably be better than going through the entire code base and
switching to LinkedHashMaps, Lists, etc. - plus it would fix your problem.

You could (for reuse by others, assuming that one doesn't already exist) create a singleton
BaseTokenComparator implements Comparator<BaseToken>  with something like:
   public int compare( final BaseToken textSpan1, final BaseToken textSpan2 ) {
      if ( textSpan1. getStartOffset () != textSpan2. getStartOffset () ) {
         return textSpan1. getStartOffset () - textSpan2. getStartOffset ();
      }
      return textSpan1. getEndOffset () - textSpan2. getEndOffset ();
   }

And in NegationContextAnalyzer line ~48
Final List<NegationIndicator> negatorsList = new ArrayList( _negIndicatorFSM.execute(fsmTokenList)
);
If ( !negatorsList.isEmpty() ) {
	Collections.sort( negatorsList, BaseTokenComparator.getInstance() );	
	Return new ContextHit( negatorsList.get(0).getStartOffset(), negatorsList.get(0).getEndOffset()
);

Or you could write a (faster) method to use in place of the List and Sort like:
BaseToken getFirstTextSpan( final Iterable<BaseToken> tokens ) {
	BaseToken firstToken  = null;
	For ( BaseToken token : tokens ) {
		If ( firstToken == null || token.getStartOffset() < firstToken.getStartOffset() ) {
			firstToken = token;
			continue;
		}
		If ( token.getStartOffset() == firstToken.getStartOffset() && token.getEndOffset()
< firstToken.getEndOffset() ) {
			firstToken = token;
		}
	}
	Return firstToken; 
		

Of course, a perfectly reasonable question to pose to the community is something like "Is
the best stored negation context the first or largest or ???"  Perhaps the first negator span
isn't the most wanted for later use - perhaps it is the most-encompassing span so that multiple
words can be reused.  You could throw that out under a new thread title and perhaps the original
authors or current users would speak up as to what might be best.  Personally I have no idea.

Anyway, great catch!

Sean


> "I do not see any" returns different ContextAnnotations in drugner pipeline
> ---------------------------------------------------------------------------
>
>                 Key: CTAKES-316
>                 URL: https://issues.apache.org/jira/browse/CTAKES-316
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-drug-ner
>    Affects Versions: 3.2.0
>            Reporter: Kim Ebert
>
> "I do not see any"
> Can result in the following ContextAnnotations:
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" _id="130"
_ref_sofa="1" begin="13" end="16" id="0" typeID="0" discoveryTechnique="0" confidence="0.0"
polarity="0" uncertainty="0" conditional="false" generic="false" historyOf="0" FocusText="I"
Scope="RIGHT"/>
> or
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" _id="130"
_ref_sofa="1" begin="5" end="16" id="0" typeID="0" discoveryTechnique="0" confidence="0.0"
polarity="0" uncertainty="0" conditional="false" generic="false" historyOf="0" FocusText="I"
Scope="RIGHT"/>
> or
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" _id="130"
_ref_sofa="1" begin="5" end="8" id="0" typeID="0" discoveryTechnique="0" confidence="0.0"
polarity="0" uncertainty="0" conditional="false" generic="false" historyOf="0" FocusText="I"
Scope="RIGHT"/>
> Well, after doing some digging it turns out that org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
is to blame.
> The code looks like the following:
> public ContextHit analyzeContext(List<? extends Annotation> contextTokens, int
scopeOrientation)
> throws AnalysisEngineProcessException {
> List<TextToken> fsmTokenList = wrapAsFsmTokens(contextTokens);
> try {
> Set<NegationIndicator> s = _negIndicatorFSM.execute(fsmTokenList);
> if (s.size() > 0) {
> NegationIndicator neg = s.iterator().next();
> return new ContextHit(neg.getStartOffset(), neg.getEndOffset());
> } else {
> return null;
> }
> } catch (Exception e) {
> throw new AnalysisEngineProcessException(e);
> }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message