ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: sentence number in WordToken
Date Mon, 30 Sep 2013 15:59:04 GMT
Hi,

if you do many selectCovering calls, you may be faster using
indexCovering once and then using the lookup index it produces.

IMHO type systems should not contain information that can easily
be calculated at runtime (e.g. sentence number, token number, etc.).

Mind, I have no say here ;) Just my personal opinion.

-- Richard

On 30.09.2013, at 16:17, samir chabou <samirchb@yahoo.com> wrote:

> Hi Pei,
> 
> I though
> this may be have some use …
>  
> Because I
> need to know if two or more words tokens belong to the same sentence; and
> since WordToken does not define the feature sentence number. I added it to the
> TypeSystem. These are the steps:
>  
> 1)      I added the sentence number
> features for the type BaseToken in TypeSystem.xml file (I choose the supper
> class in order that the feature be propagated to all subclasses
> (wordToken,SymboleToken,NumToken …)
>  
> 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set
the new feature
> (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
>      
> bta.setSentenceNumber(sentence.getSentenceNumber());
>       bta.addToIndexes();
>  
> 3)      Generate the JCASGen in the tab de TypeSystem of the
> aggregate
>  
> 4)      Add the feature in the source
> tab of the aggregate
>  
> Probably I
> could have used as alternative:
> List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
> entities to be checked at the same time or if the entity1 is found in many
> places, I have to add some if conditions to get sentence number 
> 
> 
> Thanks
> Samir


Mime
View raw message