uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine
Date Sat, 20 Jun 2009 00:44:18 GMT
Thanks for your reply Jaroslaw, it seems that I misunderstood
the way UIMA AS works.

> 1)
> "... Because the AAE is not thread safe uima as must scale it through
> creating multiple instances of it..."
>
> Since the AAE is not thread safe you should not try to scale it out in the
> same JVM. If AAE
> is not thread safe, you should only have one instance of it per JVM. You can
> scale it by
> starting multiple JVMs.
>   
I reduced my AAE to three delegate AEs:

1. HBaseCasMultiplier -> fetches the actual text from hbase
2. Tokenizer -> adds tokens to my CAS
3. HBaseWrite -> writes the tokens back into hbase

These delegates are not thread safe, to scale these AEs
one instance per worker thread must be created.
Thats what I want UIMA AS to do for me, so I think thats
also the case which is described in the documentation in 1.4.1:

"... The classes for annotators and flow controllers do not need to be 
"thread-safe"
with respect to their instance data - meaning, they do not need to be 
implemented
with synchronization locks for access to their instance data, because 
each instance
will only be called using one thread at a time. Scale out for these 
classes is done using
multiple instances of the class. ..."

> 2)
> "...I must admit the documentation confused me a bit about the meaning of
> the async attribute..."
>
> The async attribute is only used for aggregates, and specifies that this
> aggregate will be run asynchronously (with input queues in front of all of
> its delegates) or not. If you choose async="false" it means that you want to
> deploy the aggregate synchronously. Meaning it will be single-threaded. To
> UIMA AS a synchronous aggregate is the same as a
> UIMA primitive AE.
>   
Thanks, understood the difference, so I want async="true"

> 3)            ...
>             <analysisEngine key="TextAnalysis" async="false">
>                 <scaleout numberOfInstances="8" />
>
>                 <delegates>
>                     <analysisEngine key="HBaseCasMultiplier">
>                         <casMultiplier poolSize="8"/>
>                     </analysisEngine>
>                 </delegates>
>             </analysisEngine>
>             ...
>
> The above is an inconsistent configuration.  You are specifying that
> "TextAnalytics" should be deployed synchronously but then adding delegate
> configuration, which forces the aggregate to be deployed asynchronously.
> Synchronous aggregate delegate's are not "visible" to the uima-as, and
> cannot be configured in the deployment descriptor.
>   
Ok, I changed it to fit to case described above:
            <analysisEngine>
                <delegates>
                    <analysisEngine key="HBaseCasMultiplier">
                        <casMultiplier poolSize="4"/>
                        <scaleout numberOfInstances="2" />
                    </analysisEngine>
                    <analysisEngine key="Tokenizer">
                        <scaleout numberOfInstances="4" />
                    </analysisEngine>
                    <analysisEngine key="HBaseWriter">
                        <scaleout numberOfInstances="4" />
                    </analysisEngine>
                </delegates>
            </analysisEngine>

I would like to scale the HBaseCasMultiplier to more threads
then two, because there is a short delay when reading from hbase.
First I am not sure which value I should choose for the
Cas Multiplier pool size. If the numberOfInstances get larger
then two I get a few exceptions (stack trace below) when UIMA AS
starts to process the first documents. So I think I am doing something
wrong here. And what is the minimal possible casPoolSize, since
I need CAS instances for my 4 Tokenizers, 4 HBaseWriters
and 4 (?) for the CAS Multiplier, which would result in a minimum
size of 12, right ?

The HBaseCasMultiplier gets one CAS which contains the id and
then outputs one CAS which contains an actual text.

Here is the full stack trace for the exception I get now:
org.apache.uima.UIMARuntimeException: AnalysisComponent 
"/HBaseCasMultiplier/" requested more CASes (2) than defined in its 
getCasInstancesRequired() method (1).  It is possible that the 
AnalysisComponent is not properly releasing CASes when it encounters an 
error.
    at 
org.apache.uima.impl.UimaContext_ImplBase.getEmptyCas(UimaContext_ImplBase.java:575)
    at 
org.apache.uima.analysis_component.CasMultiplier_ImplBase.getEmptyCAS(CasMultiplier_ImplBase.java:109)
    at 
dk.infopaq.nlp.repository.connector.HBaseReadCasMultiplier.hasNext(HBaseReadCasMultiplier.java:107)
    at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl$AnalysisComponentCasIterator.hasNext(PrimitiveAnalysisEngine_impl.java:563)
    at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.process(PrimitiveAnalysisEngineController_impl.java:388)
    at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:130)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestWithCASReference(ProcessRequestHandler_impl.java:655)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:887)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.onMessage(UimaVmMessageListener.java:99)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:66)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:69)
    at java.lang.Thread.run(Thread.java:619)
CASAdminException: Can't flush CAS, flushing is disabled.
    at org.apache.uima.cas.impl.CASImpl.reset(CASImpl.java:850)
    at org.apache.uima.util.CasPool.releaseCas(CasPool.java:228)
    at 
org.apache.uima.resource.impl.CasManager_impl.releaseCas(CasManager_impl.java:141)
    at 
org.apache.uima.cas.AbstractCas_ImplBase.release(AbstractCas_ImplBase.java:35)
    at org.apache.uima.cas.impl.CASImpl.release(CASImpl.java:3561)
    at org.apache.uima.cas.impl.CASImpl.release(CASImpl.java:3559)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.dropCAS(BaseAnalysisEngineController.java:1044)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.dropCAS(BaseAnalysisEngineController.java:1269)
    at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.dropCAS(AggregateAnalysisEngineController_impl.java:318)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.handleAction(BaseAnalysisEngineController.java:1212)
    at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.takeAction(AggregateAnalysisEngineController_impl.java:533)
    at 
org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError(ProcessCasErrorHandler.java:566)
    at 
org.apache.uima.aae.error.ErrorHandlerChain.handle(ErrorHandlerChain.java:64)
    at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handleProcessResponseWithException(ProcessResponseHandler.java:544)
    at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handle(ProcessResponseHandler.java:644)
    at 
org.apache.uima.aae.handler.HandlerBase.delegate(HandlerBase.java:158)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:927)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.onMessage(UimaVmMessageListener.java:99)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:66)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Thanks for your help,
Jörn

Mime
View raw message