uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: Ducc Problems
Date Wed, 18 Feb 2015 07:03:24 GMT
Dear Eddie,

This problem has been resolved by using destroy method in ducc version 
1.0.0 but when I upgrade my ducc version from 1.0.0 to 1.1.0 DUCC didn't 
call the destroy method.

It also do not call the stop method of CollectionReader as well as 
finalize method of any java class as well as 
destroy/collectionProcessComplete method of cas consumer.

I want to close my connection to Database after completion of job as 
well as want to use batch processing at cas consumer level like 
PersonTitleDBWriterCasConsumer.

Thanks in advanced.

Reshu.



On 03/31/2014 04:14 PM, reshu.agarwal wrote:
> On 03/28/2014 05:28 PM, Eddie Epstein wrote:
>> Another alternative would be to do the final flush in the Cas consumer's
>> destroy method.
>>
>> Another issue to be aware of, in order to balance resources between 
>> jobs,
>> DUCC uses preemption of job processes scheduled in a "fair-share" class.
>> This may not be acceptable for jobs which are doing incremental commits.
>> The solution is to schedule the job in a non-preemptable class.
>>
>>
>> On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal 
>> <reshu.agarwal@orkash.com>wrote:
>>
>>> On 03/28/2014 01:28 AM, Eddie Epstein wrote:
>>>
>>>> Hi Reshu,
>>>>
>>>> The Job model in DUCC is for the Collection Reader to send "work item
>>>> CASes", where a work item represents a collection of work to be 
>>>> done by a
>>>> Job Process. For example, a work item could be a file or a subset of a
>>>> file
>>>> that contains many documents, where each document would be 
>>>> individually
>>>> put
>>>> into a CAS by the Cas Multiplier in the Job Process.
>>>>
>>>> DUCC is designed so that after processing the "mini-collection"
>>>> represented
>>>> by the work item,  the Cas Consumer should flush any data. This is 
>>>> done by
>>>> routing the "work item CAS" to the Cas Consumer, after all work item
>>>> documents are completed, at which point the CC does the flush.
>>>>
>>>> The sample code described in
>>>> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses
>>>> the
>>>> work item CAS to flush data in exactly this way.
>>>>
>>>> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a 
>>>> commit)
>>>> in
>>>> the process method after every 50 documents.
>>>>
>>>> Regards
>>>> Eddie
>>>>
>>>>
>>>>
>>>> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal 
>>>> <reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>>   On 03/26/2014 11:34 PM, Eddie Epstein wrote:
>>>>>   Hi Reshu,
>>>>>> The collectionProcessingComplete() method in UIMA-AS has a 
>>>>>> limitation: a
>>>>>> Collection Processing Complete request sent to the UIMA-AS Analysis
>>>>>> Service
>>>>>> is cascaded down to all delegates; however, if a particular 
>>>>>> delegate is
>>>>>> scaled-out, only one of the instances of the delegate will get this
>>>>>> call.
>>>>>>
>>>>>> Since DUCC is using UIMA-AS to scale out the Job processes, it 
>>>>>> has no
>>>>>> way
>>>>>> to deliver a CPC to all instances.
>>>>>>
>>>>>> The applications we have been running on DUCC have used the Work

>>>>>> Item
>>>>>> CAS
>>>>>> as a signal to CAS consumers to do CPC level processing. That is
>>>>>> discussed
>>>>>> in the first reference above, in the paragraph "Flushing Cached 
>>>>>> Data".
>>>>>>
>>>>>> Eddie
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <
>>>>>> reshu.agarwal@orkash.com>
>>>>>> wrote:
>>>>>>
>>>>>>    On 03/26/2014 06:43 PM, Eddie Epstein wrote:
>>>>>>
>>>>>>>    Are you using standard UIMA interface code to Solr? If so,

>>>>>>> which Cas
>>>>>>>
>>>>>>>> Consumer?
>>>>>>>>
>>>>>>>> Taking at quick look at the source code for SolrCASConsumer,

>>>>>>>> the batch
>>>>>>>> and
>>>>>>>> collection process complete methods appear to do nothing.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Eddie
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
>>>>>>>> reshu.agarwal@orkash.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>     On 03/21/2014 11:42 AM, reshu.agarwal wrote:
>>>>>>>>
>>>>>>>>      Hence we can not attempt batch processing in cas consumer

>>>>>>>> and it
>>>>>>>>>   increases our process timing. Is there any other option
for 
>>>>>>>>> that or
>>>>>>>>>> is
>>>>>>>>>> it a
>>>>>>>>>> bug in DUCC?
>>>>>>>>>>
>>>>>>>>>>     Please reply on this problem as if I am sending
document 
>>>>>>>>>> in solr
>>>>>>>>>> one by
>>>>>>>>>>
>>>>>>>>>>   one by cas consumer without using batch process
and committing
>>>>>>>>> solr. It
>>>>>>>>> is
>>>>>>>>> not optimum way to use this. Why ducc is not calling
collection
>>>>>>>>> Process
>>>>>>>>> Complete method of Cas Consumer? And If I want to do
that then 
>>>>>>>>> What
>>>>>>>>> is
>>>>>>>>> the
>>>>>>>>> way to do this?
>>>>>>>>>
>>>>>>>>> I am not able to find any thing about this in DUCC book.
>>>>>>>>>
>>>>>>>>> Thanks in Advanced.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Thanks,
>>>>>>>>> Reshu Agarwal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Hi Eddie,
>>>>>>>>>
>>>>>>>>>   I am not using standard UIMA interface code to Solr.
I 
>>>>>>>>> create my
>>>>>>>> own Cas
>>>>>>>>
>>>>>>> Consumer. I will take a look on that too. But the problem is
not 
>>>>>>> for
>>>>>>> particularly to use solr, I can use any source to store my 
>>>>>>> output. I
>>>>>>> want
>>>>>>> to do batch processing and want to use 
>>>>>>> collectionProcessComplete. Why
>>>>>>> DUCC
>>>>>>> is not calling it? I check it with UIMA AS also and my cas 
>>>>>>> consumer is
>>>>>>> working fine with it and also performing batch processing.
>>>>>>>
>>>>>>> -- 
>>>>>>> Thanks,
>>>>>>> Reshu Agarwal
>>>>>>>
>>>>>>>
>>>>>>>    Hi Eddie,
>>>>>>>
>>>>>> I am using cas consumer similar to apache uima example:
>>>>>    "apache-uima/examples/src/org/apache/uima/examples/cpe/
>>>>> PersonTitleDBWriterCasConsumer.java"
>>>>>
>>>>> -- 
>>>>> Thanks,
>>>>> Reshu Agarwal
>>>>>
>>>>>
>>>>>   Hi Eddie,
>>> You are right I know this fact. PersonTitleDBWriterCasConsumer is 
>>> doing a
>>> flush (a commit) in the process method after every 50 documents and 
>>> if less
>>> then 50 documents in cas it will do commit or flush by
>>> collectionProcessComplete method. So, If it is not called then those
>>> documents can not be committed. That is why I want ducc calls this 
>>> method.
>>>
>>> -- 
>>> Thanks,
>>> Reshu Agarwal
>>>
>>>
> Hi,
>
> Destroy method worked for me. It did the same what I wanted from 
> CollectionProcessComplete method.
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message