uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: Ducc Problems
Date Fri, 28 Mar 2014 05:22:49 GMT
On 03/28/2014 01:28 AM, Eddie Epstein wrote:
> Hi Reshu,
> The Job model in DUCC is for the Collection Reader to send "work item
> CASes", where a work item represents a collection of work to be done by a
> Job Process. For example, a work item could be a file or a subset of a file
> that contains many documents, where each document would be individually put
> into a CAS by the Cas Multiplier in the Job Process.
> DUCC is designed so that after processing the "mini-collection" represented
> by the work item,  the Cas Consumer should flush any data. This is done by
> routing the "work item CAS" to the Cas Consumer, after all work item
> documents are completed, at which point the CC does the flush.
> The sample code described in
> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses the
> work item CAS to flush data in exactly this way.
> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a commit) in
> the process method after every 50 documents.
> Regards
> Eddie
> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal <reshu.agarwal@orkash.com>wrote:
>> On 03/26/2014 11:34 PM, Eddie Epstein wrote:
>>> Hi Reshu,
>>> The collectionProcessingComplete() method in UIMA-AS has a limitation: a
>>> Collection Processing Complete request sent to the UIMA-AS Analysis
>>> Service
>>> is cascaded down to all delegates; however, if a particular delegate is
>>> scaled-out, only one of the instances of the delegate will get this call.
>>> Since DUCC is using UIMA-AS to scale out the Job processes, it has no way
>>> to deliver a CPC to all instances.
>>> The applications we have been running on DUCC have used the Work Item CAS
>>> as a signal to CAS consumers to do CPC level processing. That is discussed
>>> in the first reference above, in the paragraph "Flushing Cached Data".
>>> Eddie
>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>>> wrote:
>>>   On 03/26/2014 06:43 PM, Eddie Epstein wrote:
>>>>   Are you using standard UIMA interface code to Solr? If so, which Cas
>>>>> Consumer?
>>>>> Taking at quick look at the source code for SolrCASConsumer, the batch
>>>>> and
>>>>> collection process complete methods appear to do nothing.
>>>>> Thanks,
>>>>> Eddie
>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
>>>>> reshu.agarwal@orkash.com>
>>>>> wrote:
>>>>>    On 03/21/2014 11:42 AM, reshu.agarwal wrote:
>>>>>>    Hence we can not attempt batch processing in cas consumer and
>>>>>>> increases our process timing. Is there any other option for that
or is
>>>>>>> it a
>>>>>>> bug in DUCC?
>>>>>>>    Please reply on this problem as if I am sending document in
>>>>>>> one by
>>>>>> one by cas consumer without using batch process and committing solr.
>>>>>> is
>>>>>> not optimum way to use this. Why ducc is not calling collection Process
>>>>>> Complete method of Cas Consumer? And If I want to do that then What
>>>>>> the
>>>>>> way to do this?
>>>>>> I am not able to find any thing about this in DUCC book.
>>>>>> Thanks in Advanced.
>>>>>> --
>>>>>> Thanks,
>>>>>> Reshu Agarwal
>>>>>>    Hi Eddie,
>>>>> I am not using standard UIMA interface code to Solr. I create my own
>>>> Consumer. I will take a look on that too. But the problem is not for
>>>> particularly to use solr, I can use any source to store my output. I want
>>>> to do batch processing and want to use collectionProcessComplete. Why
>>>> DUCC
>>>> is not calling it? I check it with UIMA AS also and my cas consumer is
>>>> working fine with it and also performing batch processing.
>>>> --
>>>> Thanks,
>>>> Reshu Agarwal
>>>>   Hi Eddie,
>> I am using cas consumer similar to apache uima example:
>>   "apache-uima/examples/src/org/apache/uima/examples/cpe/
>> PersonTitleDBWriterCasConsumer.java"
>> --
>> Thanks,
>> Reshu Agarwal
Hi Eddie,

You are right I know this fact. PersonTitleDBWriterCasConsumer is doing 
a flush (a commit) in the process method after every 50 documents and if 
less then 50 documents in cas it will do commit or flush by 
collectionProcessComplete method. So, If it is not called then those 
documents can not be committed. That is why I want ducc calls this method.

Reshu Agarwal

View raw message