From Tomasz Oliwa <ol...@uchicago.edu>
Subject RE: TermConsumers
Date Thu, 19 Nov 2015 23:48:28 GMT

I tested this, the Annotator itself works, great. The only change I had to do when writing
the Annotator class with the code below is to provide generics in:

static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES
= Arrays.<Class<? extends IdentifiedAnnotation>>asList(
            MedicationMention.class, DiseaseDisorderMention.class,
            SignSymptomMention.class, LabMention.class, ProcedureMention.class );

At least on a small example XMI CAS I see the behavior is as expected for the IdentifiedAnnotations.

However, for my usecase, I have XCAS files, not XMI CAS files. I can use XCasWriterCasConsumer
to write the CAS files, but I cannot find any XCAS Collection Reader to initially read them

Is such a reader available?


From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Thursday, November 19, 2015 4:03 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers

Hi Tomasz,

I don't know that anybody has done this.  However, you could try running a pipeline with items
in ctakes-core:
XmiCollectionReaderCtakes       to read your existing cas xmi files in directory
-- custom refiner AE below --   to remove unwanted umls annotations
XmiWriterCasConsumerCtakes      to write the new cas xmi files

The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does,
but over a pre-populated cas.  This is mostly cut and paste from other code with a little
bit of lookompiling  - I haven't tested it at all!  If you do give it a run-through and it
works then let me know and I'll clean it up and check into sandbox.

static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES
= Arrays.asList(
         MedicationMention.class, DiseaseDisorderMention.class,
         SignSymptomMention.class, LabMention.class, ProcedureMention.class );
   // Don't forget AnatomicalSiteMention.class and generic EntityMention.class!

static private final Function<Annotation,TextSpan> createTextSpan
         = annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd()

static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf
= annotation -> annotation;

   public void process( final JCas jcas ) throws AnalysisEngineProcessException {
      LOGGER.info( "Starting processing" );
      for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
         refineForClass( jcas, eventClass );
      final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class
      final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select(
jcas, EntityMention.class ) );
      entityMentions.removeAll( anatomicals );
      refineForAnnotations( jcas, anatomicals );
      refineForAnnotations( jcas, entityMentions );
      LOGGER.info( "Finished processing" );

   static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas,
                                                                        final Class<T>
eventClass ) {
      refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );

   static private <T extends IdentifiedAnnotation> void refineForAnnotations( final
JCas jcas,
                                                                              final Collection<T>
annotations ) {
      final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
            = annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf )
      final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet()
      unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes(
jcas ) );

   static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan>
originalTextSpans ) {
      final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
      final Collection<TextSpan> discardSpans = new HashSet<>();
      final int count = textSpans.size();
      for ( int i = 0; i < count; i++ ) {
         final TextSpan spanKeyI = textSpans.get( i );
         for ( int j = i + 1; j < count; j++ ) {
            final TextSpan spanKeyJ = textSpans.get( j );
            if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd()
> spanKeyI.getEnd())
                 || (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd()
>= spanKeyI.getEnd()) ) {
               // J contains I, discard less precise concepts for span I and move on to next
span I
               discardSpans.add( spanKeyI );
            if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd()
> spanKeyJ.getEnd())
                  || (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd()
>= spanKeyJ.getEnd())) ) {
               // I contains J, discard less precise concepts for span J and move on to next
span J
               discardSpans.add( spanKeyJ );
      return discardSpans;

Good luck,

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 19, 2015 12:08 PM
To: dev@ctakes.apache.org
Subject: TermConsumers


How can I run a different TermConsumer on already generated CAS files?

I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer
set in cTakesHsql.xml.

Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do
the whole annotation process again. The IdentifiedAnnotations are all there, it is only a
matter of removing them according to the TermConsumers logic.

Is there a way to create a passthrough Processor that simply reads the CAS, applies a different
TermConsumer and writes it to disk?

Or is there a different way to go on about this?

Thanks for any help,

