Mailing-List: contact user-help@uima.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@uima.apache.org
MIME-Version: 1.0
In-Reply-To: <5A0534EF.4020105@orkash.com>
References: <5A042BA3.1080509@orkash.com> <CA+W_RZjQUKW9v-udw3QTrBVTYcZ5BC7e3H8bDOm8hy6+QCs2xA@mail.gmail.com>
 <5A0534EF.4020105@orkash.com>
From: Lou DeGenaro <lou.degenaro@gmail.com>
Date: Fri, 10 Nov 2017 06:16:38 -0500
Message-ID: <CA+W_RZh16NrJW7TW5YhzsVcE5xNhVJPy=_70A6aOSjb_juRA8A@mail.gmail.com>
Subject: Re: DUCC's job goes into infintie loop
To: user@uima.apache.org
Content-Type: multipart/alternative; boundary="001a114b151cf4052f055d9f0dee"
archived-at: Fri, 10 Nov 2017 11:16:44 -0000

--001a114b151cf4052f055d9f0dee
Content-Type: text/plain; charset="UTF-8"

Are you running with a shared file system on your cluster?  Is your user
log directory located there?  Look at the DUCC daemon log files located in
$DUCC_HOME/logs. They should provide some clues as to what is wrong.  Feel
free to post (non-confidential versions of) them here for a second opinion.

Lou.

On Fri, Nov 10, 2017 at 12:11 AM, priyank sharma <priyank.sharma@orkash.com>
wrote:

> There is nothing on the work item page and performance page on the web
> server. There is only one log file for the main node, no log files for
> other two nodes. Ducc job processes not able to pick the data from the data
> source and no UIMA aggregator is working for that batches.
>
> Are the issue because of the java heap space? We are giving 4gb ram to the
> job-process.
>
> Attaching the Log file.
>
> Thanks and Regards
> Priyank Sharma
>
> On Thursday 09 November 2017 04:33 PM, Lou DeGenaro wrote:
>
>> The first place to look is in your job's logs.  Visit the ducc-mon jobs
>> page ducchost:42133/jobs.jsp then click on the id of your job.  Examine
>> the
>> logs by clicking on each log file name looking for any revealing
>> information.
>>
>> Feel free to post non-confidential snippets here, or If you'd like to chat
>> in real time we can use hipchat.
>>
>> Lou.
>>
>> On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma <priyank.sharma@orkash.com
>> >
>> wrote:
>>
>> All!
>>>
>>> I have a problem regarding DUCC cluster in which a job process gets stuck
>>> and keeps on processing the same batch again and again due to maximum
>>> duration the batch gets reason or extraordinary status
>>> *"**CanceledByUser"
>>> *and then gets restarted with the same ID's. This usually happens after
>>> 15
>>> to 20 days and goes away after restarting the ducc cluster. While going
>>> through the data store that is being used by CAS consumer to ingest data,
>>> the data regarding this batch does never get ingested. So most probably
>>> this data is not being processed.
>>>
>>> How to check if this data is being processed or not?
>>>
>>> Are the resources the issue and why it is being processed after
>>> restarting
>>> the cluster?
>>>
>>> We have three nodes cluster with  32gb ram, 40gb ram and 28 gb ram.
>>>
>>>
>>>
>>> --
>>> Thanks and Regards
>>> Priyank Sharma
>>>
>>>
>>>
>

--001a114b151cf4052f055d9f0dee--