ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-16) use uimaFIT's selectCovered() instead of UIMA's subiterator
Date Fri, 25 Apr 2014 20:08:17 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981527#comment-13981527
] 

ASF subversion and git services commented on CTAKES-16:
-------------------------------------------------------

Commit 1590127 from [~tmill] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1590127 ]

CTAKES-16: Fix to use UIMAFit select instead of iterator.

> use uimaFIT's selectCovered() instead of UIMA's subiterator
> -----------------------------------------------------------
>
>                 Key: CTAKES-16
>                 URL: https://issues.apache.org/jira/browse/CTAKES-16
>             Project: cTAKES
>          Issue Type: Improvement
>          Components: ctakes-assertion, ctakes-chunker, ctakes-clinical-pipeline, ctakes-context-tokenizer,
ctakes-core, ctakes-dependency-parser, ctakes-ne-contexts, ctakes-pos-tagger
>            Reporter: Pei Chen
>            Priority: Minor
>
> Could not get consistent results from .subiterator when using uimaFIT with the cTAKES
GUI (which wires the components together dynamically).
> To get all the BaseTokens for a particular sentence, if we use the .subiterator, the
types has be stored in the FSindexes in a certain order otherwise it could just return an
empty list.  This would require the users of annotators to understand the ordering of types
and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator = jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something similar which
will always return the expected tokens.  Does anyone know if this was part of the motivation?
 Is the performance hit (if any) worth the ease of use?
> Ex:
> List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, BaseToken.class,
sentence); Another alternative is UIMA's FilteredIterator.
> There are a few places that use subiterator in cTAKES and it's tempting to use uimaFIT's
JCasUtil.selecteCovered() instead... What do others think?
> Background: This issue surfaced when we use the cTAKES GUI (which uses uimaFIT to wire
the components together instead of the Aggregate XML descriptor).
> --Pei
> On Aug 9, 2012, at 9:18 AM, Chen, Pei wrote:
> To get all the BaseTokens for a particular sentence, if we use the .subiterator,
> the types has be stored in the FSindexes in a certain order otherwise it could
> just return an empty list.  This would require the users of annotators to
> understand the ordering of types and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator =
> jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something similar
> which will always return the expected tokens.  Does anyone know if this was part
> of the motivation?
> Yes, that was exactly the motivation to avoid using subiterators. Our experience
> in uimaFIT was that subiterators never did what you wanted them to do.
> Is the performance hit (if any) worth the ease of use?
> I doubt there's a performance hit. Take a look at the source for
> JCasUtil.selectCovered vs. org.apache.uima.cas.impl.Subiterator. If anything,
> selectCovered is probably doing less.
> But of course you could time it and find out for sure.
> Steve
> Full discussion thread could be found here: http://markmail.org/search/+list:org.apache.incubator.ctakes-dev#query:%20list%3Aorg.apache.incubator.ctakes-dev+page:1+mid:hcp3rudjelddo2dy+state:results



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message