uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject Deduplicating Annotations With Same coveredText
Date Tue, 22 Apr 2014 02:20:17 GMT
  Hi!

  I'm facing a task of deduplicating annotations that have the same
getCoveredText() value (possibly at different sofa locations) - I'd
like to keep just a single of each; for example if I were to make
a bag-of-words with only single annotation per word and number of
occurences as a feature.  (Or, in my case, the annotations are scored
candidate answers in a QA system that I'd like to merge if they are
textually the same.)

  Is there a better way than simply loading all annotations of the type
to a java map, mass-dropping them from indexes, then readding some of
them?

  My idea was to simply index them by coveredText and then by sequential
iteration, it's enough to just compare getCoveredText() of current and
previous annotation to decide whether to merge them. However, it appears
that coveredText is not supported as a key feature, I'd have to make an
explicit copy of it as a separate feature. Is there any other option?

  Thanks,

				Petr "Pasky" Baudis

Mime
View raw message