Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A08BC109BA for ; Thu, 10 Oct 2013 08:22:52 +0000 (UTC) Received: (qmail 33522 invoked by uid 500); 10 Oct 2013 08:22:47 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 33422 invoked by uid 500); 10 Oct 2013 08:22:36 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 33408 invoked by uid 99); 10 Oct 2013 08:22:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 08:22:34 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FSL_HELO_BARE_IP_2,RCVD_IN_DNSWL_NONE,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcaug-uima-user@m.gmane.org designates 80.91.229.3 as permitted sender) Received: from [80.91.229.3] (HELO plane.gmane.org) (80.91.229.3) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 08:22:27 +0000 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1VUBVS-00066d-LE for user@uima.apache.org; Thu, 10 Oct 2013 10:22:07 +0200 Received: from 192.122.131.37 ([192.122.131.37]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Oct 2013 10:22:02 +0200 Received: from lriwswirl by 192.122.131.37 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Oct 2013 10:22:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@uima.apache.org From: Swirl Subject: Re: Designing collection readers: Reading multiple XML files containing multiple CASes Date: Thu, 10 Oct 2013 08:21:42 +0000 (UTC) Lines: 57 Message-ID: References: <525218C3.2050404@schor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 192.122.131.37 (Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36) X-Virus-Checked: Checked by ClamAV on apache.org > > For part c: > > I imagine an algorithm that can scan the main XML file and find the "sections". > For each section it finds, it can produce a CAS and initialize that CAS with the > section's information. > > If this algorithm lives inside an analysis component, then it can use the "CAS > Multiplier" to produce the additional CASes, one for each segment. > > See > http://uima.apache.org/d/uimaj- 2.4.2/tutorials_and_users_guides.html#ugr.tug.cm > > Is that what you're looking for, or is that off-base? > > -Marshall Yes, this was what I want. I tried using CAS Multiplier. For most part it was working (e.g. when using in a SimplePipeline.runPipeline, CpePipeline.runPipeline). But when I tried to use it in CollectionProcessingEngine, it only produced 1 CAS, instead of the few CASes that were supposed to be produced from 1 input document. Here are my steps: a. create CR description "readerDesc" to read in a text file b. create AnalysisEngineDescription "simpleTextSegmenterDesc" for SimpleTextSegmenter.class create AnalysisEngineDescription "casConsumerWriterDesc" to write CAS into XMI files c. AggregateBuilder aggregateBuilder = new AggregateBuilder(); aggregateBuilder.add(simpleTextSegmenterDesc); aggregateBuilder.add(casConsumerWriterDesc); AnalysisEngineDescription aaeDesc = aggregateBuilder.createAggregateDescription() aaeDesc.getAnalysisEngineMetaData() .getOperationalProperties().setOutputsNew CASes(false); c. CpeBuilder builder = new CpeBuilder(); builder.setReader(readerDesc); builder.setAnalysisEngine(aaeDesc); e. CollectionProcessingEngine cpe = builder.createCpe(StatusCallbackListener); f. cpe.process(); I only got 1 XMI produced instead of the few that I expected. Is CAS Multiplier usable in CPE? According to the documentation, I need to wrap it in a Aggregate AE with