uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaroslaw Cwiklik <uim...@gmail.com>
Subject Re: Ducc Problems
Date Thu, 19 Feb 2015 19:10:39 GMT
One possible explanation for destroy() not getting called is that a process
(JP) may be still working on a CAS when Ducc deallocates the process. Ducc
first asks the process to quiesce and stop and allows it 1 minute to
terminate on its own. If this does not happen, Ducc kills the process via
kill -9. In such case the process will be clobbered and destroy() methods
in UIMA-AS are not called.
There should be some evidence in JP logs at the very end. Look for
something like this:

>>>>>>>>> Process Received a Message. Is Process target for message:true.
Target PID:27520
>>> configFactory.stop() - stopped
route:mina:tcp://localhost:49338?transferExchange=true&sync=false
01:56:22.735 - 94:
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.quiesceAndStop:
INFO: Stopping Controller: ducc.jd.queue.226091
Quiescing UIMA-AS Service. Remaining Number of CASes to Process:0

Look at the timestamp of >>>>>>>>> Process Received a Message.
Is Process
target for message:true.
and compare it to a timestamp of the last log message. Does it look like
there is a long delay?


Jerry

On Wed, Feb 18, 2015 at 2:03 AM, reshu.agarwal <reshu.agarwal@orkash.com>
wrote:

> Dear Eddie,
>
> This problem has been resolved by using destroy method in ducc version
> 1.0.0 but when I upgrade my ducc version from 1.0.0 to 1.1.0 DUCC didn't
> call the destroy method.
>
> It also do not call the stop method of CollectionReader as well as
> finalize method of any java class as well as destroy/collectionProcessComplete
> method of cas consumer.
>
> I want to close my connection to Database after completion of job as well
> as want to use batch processing at cas consumer level like
> PersonTitleDBWriterCasConsumer.
>
> Thanks in advanced.
>
> Reshu.
>
>
>
>
> On 03/31/2014 04:14 PM, reshu.agarwal wrote:
>
>> On 03/28/2014 05:28 PM, Eddie Epstein wrote:
>>
>>> Another alternative would be to do the final flush in the Cas consumer's
>>> destroy method.
>>>
>>> Another issue to be aware of, in order to balance resources between jobs,
>>> DUCC uses preemption of job processes scheduled in a "fair-share" class.
>>> This may not be acceptable for jobs which are doing incremental commits.
>>> The solution is to schedule the job in a non-preemptable class.
>>>
>>>
>>> On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal <reshu.agarwal@orkash.com
>>> >wrote:
>>>
>>>  On 03/28/2014 01:28 AM, Eddie Epstein wrote:
>>>>
>>>>  Hi Reshu,
>>>>>
>>>>> The Job model in DUCC is for the Collection Reader to send "work item
>>>>> CASes", where a work item represents a collection of work to be done
>>>>> by a
>>>>> Job Process. For example, a work item could be a file or a subset of
a
>>>>> file
>>>>> that contains many documents, where each document would be individually
>>>>> put
>>>>> into a CAS by the Cas Multiplier in the Job Process.
>>>>>
>>>>> DUCC is designed so that after processing the "mini-collection"
>>>>> represented
>>>>> by the work item,  the Cas Consumer should flush any data. This is
>>>>> done by
>>>>> routing the "work item CAS" to the Cas Consumer, after all work item
>>>>> documents are completed, at which point the CC does the flush.
>>>>>
>>>>> The sample code described in
>>>>> http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses
>>>>> the
>>>>> work item CAS to flush data in exactly this way.
>>>>>
>>>>> Note that the PersonTitleDBWriterCasConsumer is doing a flush (a
>>>>> commit)
>>>>> in
>>>>> the process method after every 50 documents.
>>>>>
>>>>> Regards
>>>>> Eddie
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal <
>>>>> reshu.agarwal@orkash.com>
>>>>> wrote:
>>>>>
>>>>>   On 03/26/2014 11:34 PM, Eddie Epstein wrote:
>>>>>
>>>>>>   Hi Reshu,
>>>>>>
>>>>>>> The collectionProcessingComplete() method in UIMA-AS has a
>>>>>>> limitation: a
>>>>>>> Collection Processing Complete request sent to the UIMA-AS Analysis
>>>>>>> Service
>>>>>>> is cascaded down to all delegates; however, if a particular delegate
>>>>>>> is
>>>>>>> scaled-out, only one of the instances of the delegate will get
this
>>>>>>> call.
>>>>>>>
>>>>>>> Since DUCC is using UIMA-AS to scale out the Job processes, it
has no
>>>>>>> way
>>>>>>> to deliver a CPC to all instances.
>>>>>>>
>>>>>>> The applications we have been running on DUCC have used the Work
Item
>>>>>>> CAS
>>>>>>> as a signal to CAS consumers to do CPC level processing. That
is
>>>>>>> discussed
>>>>>>> in the first reference above, in the paragraph "Flushing Cached
>>>>>>> Data".
>>>>>>>
>>>>>>> Eddie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <
>>>>>>> reshu.agarwal@orkash.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>    On 03/26/2014 06:43 PM, Eddie Epstein wrote:
>>>>>>>
>>>>>>>     Are you using standard UIMA interface code to Solr? If so,
which
>>>>>>>> Cas
>>>>>>>>
>>>>>>>>  Consumer?
>>>>>>>>>
>>>>>>>>> Taking at quick look at the source code for SolrCASConsumer,
the
>>>>>>>>> batch
>>>>>>>>> and
>>>>>>>>> collection process complete methods appear to do nothing.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Eddie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
>>>>>>>>> reshu.agarwal@orkash.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     On 03/21/2014 11:42 AM, reshu.agarwal wrote:
>>>>>>>>>
>>>>>>>>>      Hence we can not attempt batch processing in cas
consumer and
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>>   increases our process timing. Is there any other
option for
>>>>>>>>>> that or
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>> it a
>>>>>>>>>>> bug in DUCC?
>>>>>>>>>>>
>>>>>>>>>>>     Please reply on this problem as if I am sending
document in
>>>>>>>>>>> solr
>>>>>>>>>>> one by
>>>>>>>>>>>
>>>>>>>>>>>   one by cas consumer without using batch process
and committing
>>>>>>>>>>>
>>>>>>>>>> solr. It
>>>>>>>>>> is
>>>>>>>>>> not optimum way to use this. Why ducc is not calling
collection
>>>>>>>>>> Process
>>>>>>>>>> Complete method of Cas Consumer? And If I want to
do that then
>>>>>>>>>> What
>>>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>> way to do this?
>>>>>>>>>>
>>>>>>>>>> I am not able to find any thing about this in DUCC
book.
>>>>>>>>>>
>>>>>>>>>> Thanks in Advanced.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks,
>>>>>>>>>> Reshu Agarwal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Hi Eddie,
>>>>>>>>>>
>>>>>>>>>>   I am not using standard UIMA interface code to
Solr. I create my
>>>>>>>>>>
>>>>>>>>> own Cas
>>>>>>>>>
>>>>>>>>>  Consumer. I will take a look on that too. But the problem
is not
>>>>>>>> for
>>>>>>>> particularly to use solr, I can use any source to store my
output. I
>>>>>>>> want
>>>>>>>> to do batch processing and want to use collectionProcessComplete.
>>>>>>>> Why
>>>>>>>> DUCC
>>>>>>>> is not calling it? I check it with UIMA AS also and my cas
consumer
>>>>>>>> is
>>>>>>>> working fine with it and also performing batch processing.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Reshu Agarwal
>>>>>>>>
>>>>>>>>
>>>>>>>>    Hi Eddie,
>>>>>>>>
>>>>>>>>  I am using cas consumer similar to apache uima example:
>>>>>>>
>>>>>>    "apache-uima/examples/src/org/apache/uima/examples/cpe/
>>>>>> PersonTitleDBWriterCasConsumer.java"
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Reshu Agarwal
>>>>>>
>>>>>>
>>>>>>   Hi Eddie,
>>>>>>
>>>>> You are right I know this fact. PersonTitleDBWriterCasConsumer is
>>>> doing a
>>>> flush (a commit) in the process method after every 50 documents and if
>>>> less
>>>> then 50 documents in cas it will do commit or flush by
>>>> collectionProcessComplete method. So, If it is not called then those
>>>> documents can not be committed. That is why I want ducc calls this
>>>> method.
>>>>
>>>> --
>>>> Thanks,
>>>> Reshu Agarwal
>>>>
>>>>
>>>>  Hi,
>>
>> Destroy method worked for me. It did the same what I wanted from
>> CollectionProcessComplete method.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message