Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D764B10672 for ; Mon, 31 Mar 2014 10:43:57 +0000 (UTC) Received: (qmail 26126 invoked by uid 500); 31 Mar 2014 10:43:57 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 25836 invoked by uid 500); 31 Mar 2014 10:43:55 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 25813 invoked by uid 99); 31 Mar 2014 10:43:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 10:43:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [108.166.43.89] (HELO smtp89.ord1c.emailsrvr.com) (108.166.43.89) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 10:43:47 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp4.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 8C87514113F; Mon, 31 Mar 2014 06:43:22 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp4.relay.ord1c.emailsrvr.com (Authenticated sender: reshu.agarwal-AT-orkash.com) with ESMTPSA id 2FF35140F80 for ; Mon, 31 Mar 2014 06:43:19 -0400 (EDT) Message-ID: <533946FE.2000204@orkash.com> Date: Mon, 31 Mar 2014 16:14:14 +0530 From: "reshu.agarwal" Organization: Orkash Services Pvt Ltd User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: Ducc Problems References: <532BD841.2010705@orkash.com> <5332A701.6070301@orkash.com> <5332DAB3.6030808@orkash.com> <5333B886.9070709@orkash.com> <53350729.2030807@orkash.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 03/28/2014 05:28 PM, Eddie Epstein wrote: > Another alternative would be to do the final flush in the Cas consumer's > destroy method. > > Another issue to be aware of, in order to balance resources between jobs, > DUCC uses preemption of job processes scheduled in a "fair-share" class. > This may not be acceptable for jobs which are doing incremental commits. > The solution is to schedule the job in a non-preemptable class. > > > On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal wrote: > >> On 03/28/2014 01:28 AM, Eddie Epstein wrote: >> >>> Hi Reshu, >>> >>> The Job model in DUCC is for the Collection Reader to send "work item >>> CASes", where a work item represents a collection of work to be done by a >>> Job Process. For example, a work item could be a file or a subset of a >>> file >>> that contains many documents, where each document would be individually >>> put >>> into a CAS by the Cas Multiplier in the Job Process. >>> >>> DUCC is designed so that after processing the "mini-collection" >>> represented >>> by the work item, the Cas Consumer should flush any data. This is done by >>> routing the "work item CAS" to the Cas Consumer, after all work item >>> documents are completed, at which point the CC does the flush. >>> >>> The sample code described in >>> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses >>> the >>> work item CAS to flush data in exactly this way. >>> >>> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a commit) >>> in >>> the process method after every 50 documents. >>> >>> Regards >>> Eddie >>> >>> >>> >>> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal >>> wrote: >>> >>> On 03/26/2014 11:34 PM, Eddie Epstein wrote: >>>> Hi Reshu, >>>>> The collectionProcessingComplete() method in UIMA-AS has a limitation: a >>>>> Collection Processing Complete request sent to the UIMA-AS Analysis >>>>> Service >>>>> is cascaded down to all delegates; however, if a particular delegate is >>>>> scaled-out, only one of the instances of the delegate will get this >>>>> call. >>>>> >>>>> Since DUCC is using UIMA-AS to scale out the Job processes, it has no >>>>> way >>>>> to deliver a CPC to all instances. >>>>> >>>>> The applications we have been running on DUCC have used the Work Item >>>>> CAS >>>>> as a signal to CAS consumers to do CPC level processing. That is >>>>> discussed >>>>> in the first reference above, in the paragraph "Flushing Cached Data". >>>>> >>>>> Eddie >>>>> >>>>> >>>>> >>>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal < >>>>> reshu.agarwal@orkash.com> >>>>> wrote: >>>>> >>>>> On 03/26/2014 06:43 PM, Eddie Epstein wrote: >>>>> >>>>>> Are you using standard UIMA interface code to Solr? If so, which Cas >>>>>> >>>>>>> Consumer? >>>>>>> >>>>>>> Taking at quick look at the source code for SolrCASConsumer, the batch >>>>>>> and >>>>>>> collection process complete methods appear to do nothing. >>>>>>> >>>>>>> Thanks, >>>>>>> Eddie >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal < >>>>>>> reshu.agarwal@orkash.com> >>>>>>> wrote: >>>>>>> >>>>>>> On 03/21/2014 11:42 AM, reshu.agarwal wrote: >>>>>>> >>>>>>> Hence we can not attempt batch processing in cas consumer and it >>>>>>>> increases our process timing. Is there any other option for that or >>>>>>>>> is >>>>>>>>> it a >>>>>>>>> bug in DUCC? >>>>>>>>> >>>>>>>>> Please reply on this problem as if I am sending document in solr >>>>>>>>> one by >>>>>>>>> >>>>>>>>> one by cas consumer without using batch process and committing >>>>>>>> solr. It >>>>>>>> is >>>>>>>> not optimum way to use this. Why ducc is not calling collection >>>>>>>> Process >>>>>>>> Complete method of Cas Consumer? And If I want to do that then What >>>>>>>> is >>>>>>>> the >>>>>>>> way to do this? >>>>>>>> >>>>>>>> I am not able to find any thing about this in DUCC book. >>>>>>>> >>>>>>>> Thanks in Advanced. >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks, >>>>>>>> Reshu Agarwal >>>>>>>> >>>>>>>> >>>>>>>> Hi Eddie, >>>>>>>> >>>>>>>> I am not using standard UIMA interface code to Solr. I create my >>>>>>> own Cas >>>>>>> >>>>>> Consumer. I will take a look on that too. But the problem is not for >>>>>> particularly to use solr, I can use any source to store my output. I >>>>>> want >>>>>> to do batch processing and want to use collectionProcessComplete. Why >>>>>> DUCC >>>>>> is not calling it? I check it with UIMA AS also and my cas consumer is >>>>>> working fine with it and also performing batch processing. >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Reshu Agarwal >>>>>> >>>>>> >>>>>> Hi Eddie, >>>>>> >>>>> I am using cas consumer similar to apache uima example: >>>> "apache-uima/examples/src/org/apache/uima/examples/cpe/ >>>> PersonTitleDBWriterCasConsumer.java" >>>> >>>> -- >>>> Thanks, >>>> Reshu Agarwal >>>> >>>> >>>> Hi Eddie, >> You are right I know this fact. PersonTitleDBWriterCasConsumer is doing a >> flush (a commit) in the process method after every 50 documents and if less >> then 50 documents in cas it will do commit or flush by >> collectionProcessComplete method. So, If it is not called then those >> documents can not be committed. That is why I want ducc calls this method. >> >> -- >> Thanks, >> Reshu Agarwal >> >> Hi, Destroy method worked for me. It did the same what I wanted from CollectionProcessComplete method. -- Thanks, Reshu Agarwal