The first place to look is in your job's logs. Visit the ducc-mon jobs
page ducchost:42133/jobs.jsp then click on the id of your job. Examine the
logs by clicking on each log file name looking for any revealing
information.
Feel free to post non-confidential snippets here, or If you'd like to chat
in real time we can use hipchat.
Lou.
On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma <priyank.sharma@orkash.com>
wrote:
> All!
>
> I have a problem regarding DUCC cluster in which a job process gets stuck
> and keeps on processing the same batch again and again due to maximum
> duration the batch gets reason or extraordinary status *"**CanceledByUser"
> *and then gets restarted with the same ID's. This usually happens after 15
> to 20 days and goes away after restarting the ducc cluster. While going
> through the data store that is being used by CAS consumer to ingest data,
> the data regarding this batch does never get ingested. So most probably
> this data is not being processed.
>
> How to check if this data is being processed or not?
>
> Are the resources the issue and why it is being processed after restarting
> the cluster?
>
> We have three nodes cluster with 32gb ram, 40gb ram and 28 gb ram.
>
>
>
> --
> Thanks and Regards
> Priyank Sharma
>
>
|