uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine
Date Mon, 22 Jun 2009 16:18:49 GMT
> I am note sure if I should run async or not. Right now
> the analysis is running on one quad core server.
> Now I would like to setup UIMA AS in a way that
> it uses all the CPU time of all cores for fetching/writing
> documents to and from HBase and for analysis.
> The interaction with HBase makes the thread idling
> for short period of time, thats why I need maybe like
> 10 threads for fetching and 10 threads for writing
> to pump enough documents through the machine
> to keep it busy.
> Having the AAE async would have the advantage for me
> that I only need 10 instances of the fetching CM and 10
> instance of the writing delegate AE and not 20 instances
> of the whole AAE. The same is true for  analysis there
> I can just scale the AEs which are slow.
> Though for scaling the CM I have to use the suggested
> workaround.
> So all in all I think having it async would be an advantage,
> but for now it would just be fine to not have it async because
> that seems easier.
>> Assuming that your
>> AE runs correctly as a single threaded aggregate, creating multiple
>> instances of this seems fine. The correction to your previous deployment
>> descriptor would just be:
>>          <analysisEngine key="TextAnalysis" async="false">
>>              <scaleout numberOfInstances="8" />
>>          </analysisEngine>
>> From UIMA AS point of view, this component is not a CasMultiplier
>> because [I assume] it comsumes new CASes internally and does not
>> return them.
>> Let emphasize that before AS scaleout the aggregate should be tested
>> as a simple UIMA aggregate with the normal tools like CVD, runAE,
>> or a custom driver.
> I tested the correction but got the first exception again.
> Here is now the full stack trace and not only the cause:

Does this error happen right away, or randomly after some period of
processing? Can you confirm that if you run this configuration with
scaleout=1 there is no problem?

> How does CorpusReader get the id which is included in the input Cas ?

Have the CM put the id into the new Cas for the CorpusReader.
Just create an FS with the appropriate feature to hold the id, and add
that FS to the index. The getAllIndexedFS(type) method is convenient
for getting an indexed FS that does not have a custom covering index


View raw message