Is there a reason that the configuration parameters that are being read
by the last annotator can't be an annotation on the given document. For
example, one can imagine a pipeline where there is an Analysis Engine
that checks the language of the document and then a separate
morphological tokenizer creates morpheme annotations using this
information. The natural way to do now would be to set a new
DocumentAnnotation on the document with the appropriate language and
have the tokenizer AE read this. Does the analysis engine you are
dealing with wrap something that requires an actual configuration file
or can it take for example a string argument instead? In this case it
might even be best to create the string for the second annotator on the
fly and send it directly rather than writing to the disk somewhere. I
think that if you have access to the code it would be better to treat
everything that changes from document to document as belonging on the
CAS and put all the configuration parameters in the AE descriptor.
-----Original Message-----
From: Andrew Shirk [mailto:shirk@ncsa.uiuc.edu]
Sent: Wednesday, August 29, 2007 12:06 PM
To: uima-user@incubator.apache.org
Subject: read/write resource sharing
What is the best practice for sharing read/write resources amongst
analysis engines in an aggregate? For example, say you have an
annotator early in a flow that reads a configuration file off disk in
order determine its behavior. Then, the next annotator does
something, and needs to write changes to the configuration file so
that another annotator downstream, whose behavior is also determined
by the contents of the configuration file, can read in the resource
that contains the changes.
Does this make sense?
Any help or ideas would be appreciated. I can think of some ugly
hacks, but it would be nice to know if I'm missing some portion of
the API that supports this type of scenario.
Thanks, Andrew
|