uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Ducc Problems
Date Fri, 28 Mar 2014 11:58:45 GMT
Another alternative would be to do the final flush in the Cas consumer's
destroy method.

Another issue to be aware of, in order to balance resources between jobs,
DUCC uses preemption of job processes scheduled in a "fair-share" class.
This may not be acceptable for jobs which are doing incremental commits.
The solution is to schedule the job in a non-preemptable class.


On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal <reshu.agarwal@orkash.com>wrote:

> On 03/28/2014 01:28 AM, Eddie Epstein wrote:
>
>> Hi Reshu,
>>
>> The Job model in DUCC is for the Collection Reader to send "work item
>> CASes", where a work item represents a collection of work to be done by a
>> Job Process. For example, a work item could be a file or a subset of a
>> file
>> that contains many documents, where each document would be individually
>> put
>> into a CAS by the Cas Multiplier in the Job Process.
>>
>> DUCC is designed so that after processing the "mini-collection"
>> represented
>> by the work item,  the Cas Consumer should flush any data. This is done by
>> routing the "work item CAS" to the Cas Consumer, after all work item
>> documents are completed, at which point the CC does the flush.
>>
>> The sample code described in
>> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses
>> the
>> work item CAS to flush data in exactly this way.
>>
>> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a commit)
>> in
>> the process method after every 50 documents.
>>
>> Regards
>> Eddie
>>
>>
>>
>> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>> wrote:
>>
>>  On 03/26/2014 11:34 PM, Eddie Epstein wrote:
>>>
>>>  Hi Reshu,
>>>>
>>>> The collectionProcessingComplete() method in UIMA-AS has a limitation: a
>>>> Collection Processing Complete request sent to the UIMA-AS Analysis
>>>> Service
>>>> is cascaded down to all delegates; however, if a particular delegate is
>>>> scaled-out, only one of the instances of the delegate will get this
>>>> call.
>>>>
>>>> Since DUCC is using UIMA-AS to scale out the Job processes, it has no
>>>> way
>>>> to deliver a CPC to all instances.
>>>>
>>>> The applications we have been running on DUCC have used the Work Item
>>>> CAS
>>>> as a signal to CAS consumers to do CPC level processing. That is
>>>> discussed
>>>> in the first reference above, in the paragraph "Flushing Cached Data".
>>>>
>>>> Eddie
>>>>
>>>>
>>>>
>>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <
>>>> reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>>   On 03/26/2014 06:43 PM, Eddie Epstein wrote:
>>>>
>>>>>   Are you using standard UIMA interface code to Solr? If so, which Cas
>>>>>
>>>>>> Consumer?
>>>>>>
>>>>>> Taking at quick look at the source code for SolrCASConsumer, the
batch
>>>>>> and
>>>>>> collection process complete methods appear to do nothing.
>>>>>>
>>>>>> Thanks,
>>>>>> Eddie
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
>>>>>> reshu.agarwal@orkash.com>
>>>>>> wrote:
>>>>>>
>>>>>>    On 03/21/2014 11:42 AM, reshu.agarwal wrote:
>>>>>>
>>>>>>     Hence we can not attempt batch processing in cas consumer and
it
>>>>>>>
>>>>>>>  increases our process timing. Is there any other option for
that or
>>>>>>>> is
>>>>>>>> it a
>>>>>>>> bug in DUCC?
>>>>>>>>
>>>>>>>>    Please reply on this problem as if I am sending document
in solr
>>>>>>>> one by
>>>>>>>>
>>>>>>>>  one by cas consumer without using batch process and committing
>>>>>>> solr. It
>>>>>>> is
>>>>>>> not optimum way to use this. Why ducc is not calling collection
>>>>>>> Process
>>>>>>> Complete method of Cas Consumer? And If I want to do that then
What
>>>>>>> is
>>>>>>> the
>>>>>>> way to do this?
>>>>>>>
>>>>>>> I am not able to find any thing about this in DUCC book.
>>>>>>>
>>>>>>> Thanks in Advanced.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Reshu Agarwal
>>>>>>>
>>>>>>>
>>>>>>>    Hi Eddie,
>>>>>>>
>>>>>>>  I am not using standard UIMA interface code to Solr. I create
my
>>>>>> own Cas
>>>>>>
>>>>> Consumer. I will take a look on that too. But the problem is not for
>>>>> particularly to use solr, I can use any source to store my output. I
>>>>> want
>>>>> to do batch processing and want to use collectionProcessComplete. Why
>>>>> DUCC
>>>>> is not calling it? I check it with UIMA AS also and my cas consumer is
>>>>> working fine with it and also performing batch processing.
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Reshu Agarwal
>>>>>
>>>>>
>>>>>   Hi Eddie,
>>>>>
>>>> I am using cas consumer similar to apache uima example:
>>>
>>>   "apache-uima/examples/src/org/apache/uima/examples/cpe/
>>> PersonTitleDBWriterCasConsumer.java"
>>>
>>> --
>>> Thanks,
>>> Reshu Agarwal
>>>
>>>
>>>  Hi Eddie,
>
> You are right I know this fact. PersonTitleDBWriterCasConsumer is doing a
> flush (a commit) in the process method after every 50 documents and if less
> then 50 documents in cas it will do commit or flush by
> collectionProcessComplete method. So, If it is not called then those
> documents can not be committed. That is why I want ducc calls this method.
>
> --
> Thanks,
> Reshu Agarwal
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message