uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine
Date Mon, 22 Jun 2009 18:04:24 GMT
Eddie Epstein wrote:
>> I am note sure if I should run async or not. Right now
>> the analysis is running on one quad core server.
>> Now I would like to setup UIMA AS in a way that
>> it uses all the CPU time of all cores for fetching/writing
>> documents to and from HBase and for analysis.
>> The interaction with HBase makes the thread idling
>> for short period of time, thats why I need maybe like
>> 10 threads for fetching and 10 threads for writing
>> to pump enough documents through the machine
>> to keep it busy.
>>
>> Having the AAE async would have the advantage for me
>> that I only need 10 instances of the fetching CM and 10
>> instance of the writing delegate AE and not 20 instances
>> of the whole AAE. The same is true for  analysis there
>> I can just scale the AEs which are slow.
>> Though for scaling the CM I have to use the suggested
>> workaround.
>>
>> So all in all I think having it async would be an advantage,
>> but for now it would just be fine to not have it async because
>> that seems easier.
>>     
>>> Assuming that your
>>> AE runs correctly as a single threaded aggregate, creating multiple
>>> instances of this seems fine. The correction to your previous deployment
>>> descriptor would just be:
>>>
>>>          <analysisEngine key="TextAnalysis" async="false">
>>>              <scaleout numberOfInstances="8" />
>>>          </analysisEngine>
>>>
>>> From UIMA AS point of view, this component is not a CasMultiplier
>>> because [I assume] it comsumes new CASes internally and does not
>>> return them.
>>>
>>> Let emphasize that before AS scaleout the aggregate should be tested
>>> as a simple UIMA aggregate with the normal tools like CVD, runAE,
>>> or a custom driver.
>>>
>>>       
>> I tested the correction but got the first exception again.
>> Here is now the full stack trace and not only the cause:
>>
>>     
>
> Does this error happen right away, or randomly after some period of
> processing? Can you confirm that if you run this configuration with
> scaleout=1 there is no problem?
>   

Yes with numberOfInstances=1 it works.

Here is the configuration again:
<analysisEngine async="false">
    <scaleout numberOfInstances="1" />
</analysisEngine>

Now changed numberOfInstances to 2.
The first CAS goes through with out an error,
second CAS throws the exception and third goes through
without an error, fourth CAS throws the exception again and then I stopped
debugging. I used the 2.3.0-SNAPSHOT of today for the test.

For me it looks a bit like that one of the two AAE instances works properly.

Jörn



Mime
View raw message