I want to process my document collection using CPM and I want to use the
batch feature.
The documentation says that the following method in the
CollectionProcessingManager
void process(CollectionReader
<file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
collection\CollectionReader.html> aCollectionReader,
int aBatchSize)
throws ResourceInitializationException
<file:///C:\uimaj-2.2.0-incubating\apache-uima\docs\api\org\apache\uima\
resource\ResourceInitializationException.html>
breaks the processing into batches of size determined by the aBatchSize
parameter. Each CasConsumer will be notified at the end of the batch.
When I tried this method in my application, the processing stops after
processing the first batch of documents. I was hoping that the
execution would continue to next batch of documents after each batch
processing is complete.
I tried the following as a test.
I downloaded uimaj-2.2.0 binaries into my computer and used
SimpleRunCPM in examples to perform my test
I modified the SimpleRunCPM.java in org.apache.uima.examples.cpe and
changed the batch size to 4 (instead of 10) and then ran the following
command line arguments
C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\collection_re
ader\FileSystemCollectionReader.xml
C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\analysis_engi
ne\NamesAndPersonTitles_TAE.xml
C:\uimaj-2.2.0-incubating\apache-uima\examples\descriptors\cas_consumer\
XmiWriterCasConsumer.xml
I modified the FileSystemCollectionReader.xml to have the default as
C:\uimaj-2.2.0-incubating\apache-uima\examples\data
The input folder has 8 text files, but the processing completes after 4
documents.
Is this the expected behavior? If not is there anything I need to change
in the code to get the multiple batches to work?
Thanks in advance for any help
-kannan
|