uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Shirk <sh...@ncsa.uiuc.edu>
Subject RE: read/write resource sharing
Date Wed, 29 Aug 2007 18:27:31 GMT
Hi Frank,

At 11:29 AM 8/29/2007, you wrote:
>Is there a reason that the configuration parameters that are being read
>by the last annotator can't be an annotation on the given document.

They have nothing to do with the document per se.

>example, one can imagine a pipeline where there is an Analysis Engine
>that checks the language of the document and then a separate
>morphological tokenizer creates morpheme annotations using this
>information.  The natural way to do now would be to set a new
>DocumentAnnotation on the document with the appropriate language and
>have the tokenizer AE read this.

Yes, in that case, the CAS would be used pretty much as it was 
intended. I'm stumbling on the conceptual mismatch between my 
configuration variables, and an "annotation."

>  Does the analysis engine you are
>dealing with wrap something that requires an actual configuration file
>or can it take for example a string argument instead?

Yes, I'm creating an annotator that wraps a legacy process flow 
execution system. The execution system ingests a process flow 
description (a graph of work nodes) in XML, and then executes it. 
Right now, I have the path to the flow description file specified in 
an analysis engine parameter, which has the obvious downside of 
requiring a user of the annotator to edit the engine descriptor 
whenever they want to change the process flow that will be executed. 
I thought that using a DataResource would allow me to store the path 
in an external file, which could be easily edited by hand by a user, 
and then read in by the DataResource implementation. With UIMA's 
support of resource sharing, I thought it would be straightforward to 
write to the file, or UIMA's cached version of the file in memory, 
for downstream annotators to use. With this approach, I could reuse 
my process flow wrapper annotator multiple times within an aggregate 
without needing to edit the descriptor.

I was trying to avoid describing all the details, but this should 
help you better understand my scenario.

>In this case it
>might even be best to create the string for the second annotator on the
>fly and send it directly rather than writing to the disk somewhere.  I
>think that if you have access to the code it would be better to treat
>everything that changes from document to document as belonging on the
>CAS and put all the configuration parameters in the AE descriptor.

Yes, that may be the best approach given the current state of UIMA.

If you have any further thoughts now that I've elaborated on my 
problem, I'd love to hear them.

Thanks, Andrew 

View raw message