Hi Ben,
each annotator can implement collectionProcessComplete().
Quoting from the documentation:
"The framework calls the collectionProcessComplete() method at the end of the collection (i.e.,
when all objects in the collection have been processed). At this point in time, no CAS is
passed in as a parameter. This gives the CAS Consumer or Analysis Engine an opportunity to
perform collection processing over the entire set of objects in the collection."
In our implementation of tf.idf, we have an annotator collect the tf score for each document
in process() and computes the idf part in collectionProcessComplete().
-Torsten
On 10.10.18, 17:21, "Benedict Holland" <benedict.m.holland@gmail.com> wrote:
Hello all,
I continue to have a problem that comes up a lot. I have a collection
processing engine. I want something to run after all of the processing is
done. For example, I have a collection of texts and want to run a tf-idf. I
generate a tf for each document and at the end, I generate an idf over the
collection. I can't put that in an annotator as part of my
processing pipeline.
Is there an aggregate annotator that will run after the entire collection
is processed?
Thanks,
~Ben
|