ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samir chabou <samir...@yahoo.com>
Subject sentence number in WordToken
Date Mon, 30 Sep 2013 14:17:23 GMT
Hi Pei,

I though
this may be have some use …
 
Because I
need to know if two or more words tokens belong to the same sentence; and
since WordToken does not define the feature sentence number. I added it to the
TypeSystem. These are the steps:
 
1)      I added the sentence number
features for the type BaseToken in TypeSystem.xml file (I choose the supper
class in order that the feature be propagated to all subclasses
(wordToken,SymboleToken,NumToken …)
 
2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set
the new feature
(BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
     
bta.setSentenceNumber(sentence.getSentenceNumber());
      bta.addToIndexes();
 
3)      Generate the JCASGen in the tab de TypeSystem of the
aggregate
 
4)      Add the feature in the source
tab of the aggregate
 
Probably I
could have used as alternative:
List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
entities to be checked at the same time or if the entity1 is found in many
places, I have to add some if conditions to get sentence number 


Thanks
Samir

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message