uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: Ducc Problems
Date Mon, 31 Mar 2014 10:44:14 GMT
On 03/28/2014 05:28 PM, Eddie Epstein wrote:
> Another alternative would be to do the final flush in the Cas consumer's
> destroy method.
>
> Another issue to be aware of, in order to balance resources between jobs,
> DUCC uses preemption of job processes scheduled in a "fair-share" class.
> This may not be acceptable for jobs which are doing incremental commits.
> The solution is to schedule the job in a non-preemptable class.
>
>
> On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal <reshu.agarwal@orkash.com>wrote:
>
>> On 03/28/2014 01:28 AM, Eddie Epstein wrote:
>>
>>> Hi Reshu,
>>>
>>> The Job model in DUCC is for the Collection Reader to send "work item
>>> CASes", where a work item represents a collection of work to be done by a
>>> Job Process. For example, a work item could be a file or a subset of a
>>> file
>>> that contains many documents, where each document would be individually
>>> put
>>> into a CAS by the Cas Multiplier in the Job Process.
>>>
>>> DUCC is designed so that after processing the "mini-collection"
>>> represented
>>> by the work item,  the Cas Consumer should flush any data. This is done by
>>> routing the "work item CAS" to the Cas Consumer, after all work item
>>> documents are completed, at which point the CC does the flush.
>>>
>>> The sample code described in
>>> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses
>>> the
>>> work item CAS to flush data in exactly this way.
>>>
>>> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a commit)
>>> in
>>> the process method after every 50 documents.
>>>
>>> Regards
>>> Eddie
>>>
>>>
>>>
>>> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>>> wrote:
>>>
>>>   On 03/26/2014 11:34 PM, Eddie Epstein wrote:
>>>>   Hi Reshu,
>>>>> The collectionProcessingComplete() method in UIMA-AS has a limitation:
a
>>>>> Collection Processing Complete request sent to the UIMA-AS Analysis
>>>>> Service
>>>>> is cascaded down to all delegates; however, if a particular delegate
is
>>>>> scaled-out, only one of the instances of the delegate will get this
>>>>> call.
>>>>>
>>>>> Since DUCC is using UIMA-AS to scale out the Job processes, it has no
>>>>> way
>>>>> to deliver a CPC to all instances.
>>>>>
>>>>> The applications we have been running on DUCC have used the Work Item
>>>>> CAS
>>>>> as a signal to CAS consumers to do CPC level processing. That is
>>>>> discussed
>>>>> in the first reference above, in the paragraph "Flushing Cached Data".
>>>>>
>>>>> Eddie
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <
>>>>> reshu.agarwal@orkash.com>
>>>>> wrote:
>>>>>
>>>>>    On 03/26/2014 06:43 PM, Eddie Epstein wrote:
>>>>>
>>>>>>    Are you using standard UIMA interface code to Solr? If so, which
Cas
>>>>>>
>>>>>>> Consumer?
>>>>>>>
>>>>>>> Taking at quick look at the source code for SolrCASConsumer,
the batch
>>>>>>> and
>>>>>>> collection process complete methods appear to do nothing.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Eddie
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
>>>>>>> reshu.agarwal@orkash.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>     On 03/21/2014 11:42 AM, reshu.agarwal wrote:
>>>>>>>
>>>>>>>      Hence we can not attempt batch processing in cas consumer
and it
>>>>>>>>   increases our process timing. Is there any other option
for that or
>>>>>>>>> is
>>>>>>>>> it a
>>>>>>>>> bug in DUCC?
>>>>>>>>>
>>>>>>>>>     Please reply on this problem as if I am sending document
in solr
>>>>>>>>> one by
>>>>>>>>>
>>>>>>>>>   one by cas consumer without using batch process and
committing
>>>>>>>> solr. It
>>>>>>>> is
>>>>>>>> not optimum way to use this. Why ducc is not calling collection
>>>>>>>> Process
>>>>>>>> Complete method of Cas Consumer? And If I want to do that
then What
>>>>>>>> is
>>>>>>>> the
>>>>>>>> way to do this?
>>>>>>>>
>>>>>>>> I am not able to find any thing about this in DUCC book.
>>>>>>>>
>>>>>>>> Thanks in Advanced.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Reshu Agarwal
>>>>>>>>
>>>>>>>>
>>>>>>>>     Hi Eddie,
>>>>>>>>
>>>>>>>>   I am not using standard UIMA interface code to Solr. I
create my
>>>>>>> own Cas
>>>>>>>
>>>>>> Consumer. I will take a look on that too. But the problem is not
for
>>>>>> particularly to use solr, I can use any source to store my output.
I
>>>>>> want
>>>>>> to do batch processing and want to use collectionProcessComplete.
Why
>>>>>> DUCC
>>>>>> is not calling it? I check it with UIMA AS also and my cas consumer
is
>>>>>> working fine with it and also performing batch processing.
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Reshu Agarwal
>>>>>>
>>>>>>
>>>>>>    Hi Eddie,
>>>>>>
>>>>> I am using cas consumer similar to apache uima example:
>>>>    "apache-uima/examples/src/org/apache/uima/examples/cpe/
>>>> PersonTitleDBWriterCasConsumer.java"
>>>>
>>>> --
>>>> Thanks,
>>>> Reshu Agarwal
>>>>
>>>>
>>>>   Hi Eddie,
>> You are right I know this fact. PersonTitleDBWriterCasConsumer is doing a
>> flush (a commit) in the process method after every 50 documents and if less
>> then 50 documents in cas it will do commit or flush by
>> collectionProcessComplete method. So, If it is not called then those
>> documents can not be committed. That is why I want ducc calls this method.
>>
>> --
>> Thanks,
>> Reshu Agarwal
>>
>>
Hi,

Destroy method worked for me. It did the same what I wanted from 
CollectionProcessComplete method.

-- 
Thanks,
Reshu Agarwal


Mime
View raw message