uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LeHouillier, Frank D." <Frank.LeHouill...@gd-ais.com>
Subject RE: read/write resource sharing
Date Wed, 29 Aug 2007 17:26:08 GMT
Why is this a kludge?  You want the parameters associated with each CAS
right?  Would this still be a kludge in a networked/distributed setup
where presumably one analysis engine might have worked through lots of
documents while the next is still working on the first etc. and the
global variable would be out of date?  I'm actually not sure about the
implementation details of Apache-UIMA at the moment with Jvinci etc. but
this is a possibility according to the standard right?  

On the other hand, Andrew might have a point about needing some global
variable space.  Is there some way that the CAS's coming through are
dependent on each other?  For example, we could imagine a pipeline that
has to deal with lots of duplicates and wants to somehow store the
annotations so that it doesn't have to repeat the work.  In this case
some sort of global memory would really be necessary.  Or is there a
smarter way to do this?

-----Original Message-----
From: Andrew Shirk [mailto:shirk@ncsa.uiuc.edu] 
Sent: Wednesday, August 29, 2007 12:39 PM
To: uima-user@incubator.apache.org
Subject: Re: read/write resource sharing

Hi Michael,

Yes, that's the approach I started with, but the DataResource javadoc 
indicates that if you directly access the resource, the benefits of 
the ResourceManager  (caching and sharing) are lost. Furthermore, if 
in my SharedResourceObject implementation I make modifications to the 
resource, then it will be out of sync with the ResourceManager's 
cache. The next annotator very well may get the stale version of the

Thilo, I'm afraid that's the approach I may end up having to use, but 
it's really a kludge.

Is there no global variable space, outside of the CAS, for the entire 
aggregate? If there were, that would be the best solution I think...

Thanks for the suggestions.


At 11:27 AM 8/29/2007, you wrote:
>Another possibility are external resources. When defining external 
>resources one or more annotators can share the same resource.
>The UIMA framework take care of the resource's life cycle.
>You will find some documentation about external resources in the 
>UIMA reference guide at External Resource Dependencies.
>You can also check the UIMA examples - tutorial ex6 use external 
>resources. (apache-uima/examples/descriptors/tutorial/ex6)
>-- Michael
>Thilo Goetz wrote:
>>If this happens often, one idea might be just to
>>stick the information in the CAS.  That way you
>>can even run several instances of this pipeline
>>and it will still work ;-)  Of course you're not
>>persisting the info that way, not sure if this is
>>a requirement or not.
>>Andrew Shirk wrote:
>>>What is the best practice for sharing read/write resources amongst
>>>analysis engines in an aggregate? For example, say you have an
>>>early in a flow that reads a configuration file off disk in order
>>>determine its behavior. Then, the next annotator does something, and
>>>needs to write changes to the configuration file so that another
>>>annotator downstream, whose behavior is also determined by the
>>>of the configuration file, can read in the resource that contains the
>>>Does this make sense?
>>>Any help or ideas would be appreciated. I can think of some ugly
>>>but it would be nice to know if I'm missing some portion of the API
>>>supports this type of scenario.
>>>Thanks, Andrew

View raw message