uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: CPM batch processing
Date Mon, 12 Nov 2007 16:01:22 GMT
Hi Kannan -

I think you have come across a "partially implemented" feature, which
has never been completed.

One work-around is to implement batching yourself in your Cas
Consumer(s), by passing in a batch-size parameter in your Cas Consumer
descriptor and then having each consumer that wants to, count the # of
documents processed, until the batch size is reached, and then do the
end of batch processing.

If your Cas Consumers are scaled out via being replicated, be aware that
they will not "see" every CAS that is flowing in the system.  You can
specify if you want a Cas Consumer to be replicated or not, using the
<operationalProperties> <multipleDeploymentAllowed> true|false
</multipleDeploymentAllowed> </operationalProperties> XML specification;
see section 2.4.1.9 in this part of the reference manual: 
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.0-incubating/docs/html/references/references.html#ugr.ref.xml.component_descriptor.aes.primitive

-Marshall

Kannan Chellappa wrote:
> I want to process my document collection using CPM and I want to use the
> batch feature.
>
> The documentation says that the following method in the
> CollectionProcessingManager
>
>  
>
>                void process(CollectionReader
> <file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
> collection\CollectionReader.html>  aCollectionReader,
>              int aBatchSize)
>
>              throws ResourceInitializationException
> <file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
> resource\ResourceInitializationException.html> 
>
>  
>
> breaks the processing into batches of size determined by the aBatchSize
> parameter. Each CasConsumer will be notified at the end of the batch.
>
>  
>
> When I tried this method in my application, the processing stops after
> processing the first batch of documents.  I was hoping that the
> execution would continue to next batch of documents after each batch
> processing is complete.
>
>  
>
> I tried the following as a test.
>
>  
>
> I downloaded uimaj-2.2.0  binaries into my computer and used
> SimpleRunCPM in examples to perform my test
>
>  
>
> I modified the SimpleRunCPM.java in org.apache.uima.examples.cpe and
> changed the batch size to 4 (instead of 10) and then ran the following
> command line arguments
>
>  
>
> C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\collection_re
> ader\FileSystemCollectionReader.xml 
>
> C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\analysis_engi
> ne\NamesAndPersonTitles_TAE.xml  
>
> C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\cas_consumer\
> XmiWriterCasConsumer.xml
>
>  
>
> I modified the FileSystemCollectionReader.xml to have the default as
> C:\uimaj-2.2.0-incubating\apache-uima\examples\data
>
>  
>
> The input folder has 8 text files, but the processing completes after 4
> documents.
>
> Is this the expected behavior? If not is there anything I need to change
> in the code to get the multiple batches to work?
>
>  
>
> Thanks in advance for any help
>
>  
>
> -kannan
>
>
>   


Mime
View raw message