uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Chellappa" <kchella...@kana.com>
Subject CPM batch processing
Date Mon, 12 Nov 2007 07:06:26 GMT
I want to process my document collection using CPM and I want to use the
batch feature.

The documentation says that the following method in the
CollectionProcessingManager

 

               void process(CollectionReader
<file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
collection\CollectionReader.html>  aCollectionReader,
             int aBatchSize)

             throws ResourceInitializationException
<file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
resource\ResourceInitializationException.html> 

 

breaks the processing into batches of size determined by the aBatchSize
parameter. Each CasConsumer will be notified at the end of the batch.

 

When I tried this method in my application, the processing stops after
processing the first batch of documents.  I was hoping that the
execution would continue to next batch of documents after each batch
processing is complete.

 

I tried the following as a test.

 

I downloaded uimaj-2.2.0  binaries into my computer and used
SimpleRunCPM in examples to perform my test

 

I modified the SimpleRunCPM.java in org.apache.uima.examples.cpe and
changed the batch size to 4 (instead of 10) and then ran the following
command line arguments

 

C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\collection_re
ader\FileSystemCollectionReader.xml 

C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\analysis_engi
ne\NamesAndPersonTitles_TAE.xml  

C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\cas_consumer\
XmiWriterCasConsumer.xml

 

I modified the FileSystemCollectionReader.xml to have the default as
C:\uimaj-2.2.0-incubating\apache-uima\examples\data

 

The input folder has 8 text files, but the processing completes after 4
documents.

Is this the expected behavior? If not is there anything I need to change
in the code to get the multiple batches to work?

 

Thanks in advance for any help

 

-kannan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message