Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3E34E200D3B for ; Fri, 10 Nov 2017 12:16:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3CC06160BF2; Fri, 10 Nov 2017 11:16:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82F96160BEE for ; Fri, 10 Nov 2017 12:16:43 +0100 (CET) Received: (qmail 59251 invoked by uid 500); 10 Nov 2017 11:16:42 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 59231 invoked by uid 99); 10 Nov 2017 11:16:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Nov 2017 11:16:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 84A7118339D for ; Fri, 10 Nov 2017 11:16:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id N9rBHe1gNNkL for ; Fri, 10 Nov 2017 11:16:40 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B184A5FB4E for ; Fri, 10 Nov 2017 11:16:39 +0000 (UTC) Received: by mail-lf0-f48.google.com with SMTP id f125so10594663lff.4 for ; Fri, 10 Nov 2017 03:16:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=B10TmM5cZmyLtQvZIYPHxCSMLkvQQJKImYg2xWZLqzw=; b=Cld3z0w3BP67QU2h94fy+17olyWA3kbrgwrviPXew9uykXnXQRm6aNA4LldfixkwOp rraCZlQCdzE1CZ/L0vZ5LgS3wMdfIPBPKyBie+4cLCnEppFwn4Pp33riPfoiuIy1uZRd JDnFaE/fBj3bfGXqIedPdcVbDjtiVMLHE6J09FaUP0g/23bvJ7Y11fNJ5ZGE4tAtcZ39 5LWgwENHOAYPVUOoIZ4UMe+9q18cfuU2RVjORWPP4Y6GlYxwExmNoaX31hhr9nGfTC/r ZPP0IRF5PRVbOmqkvp0hcMLYyI39v6olbRloaCzkfyITu/O//9KbC8QyEB2yyDQCoHpC 3D+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=B10TmM5cZmyLtQvZIYPHxCSMLkvQQJKImYg2xWZLqzw=; b=DkAgn2jtbUSSxH//b3fF9jF4ROUihu5x6Y7cOQ4n4+iQqV/G84k4ouOsjXC+I1dfky Dw2PqwQE5wiBDfIYHUqo917Jux8U/NWFaAk4kL2LcsoxAXEhNeaI93Ym4qGVVDt0x7JH uWtt/J3r0W6Sjov/VQ74FPoAPma3Wg0PlZbIUa6W3v5iVsNgwN9akbUMoR5aftNW0Zgx haNLLpSqzthmPDIhUFGM/GSz9V+wR4qlZPwt03137LOvfrZDcs+YrJ6baMkJxC1j7kcj 0Zyf8ZIDcoLshWUu/9qtYgYum8wxiiYEVWVC7ZZGFslQs8RaHTq6lg7EfcjZ0asWv+1w BpmQ== X-Gm-Message-State: AJaThX7wfspXSFtuRT7QpjcOaj1unoYvQM/rsMikmh7VaGf6ljzTWMAZ TTwkRd3z0EgrHspV0j/HGNi6hcgkuhWkGbtHklA= X-Google-Smtp-Source: ABhQp+TKB+csuvhI6+8/aeztI1YEd4ruthx3iBbvFw1CGcmRaMFiRMrgnekorqDAVAldlH3cq7bn+cFNNUanvAtvNIk= X-Received: by 10.46.9.198 with SMTP id 189mr1565427ljj.102.1510312598830; Fri, 10 Nov 2017 03:16:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.41.130 with HTTP; Fri, 10 Nov 2017 03:16:38 -0800 (PST) In-Reply-To: <5A0534EF.4020105@orkash.com> References: <5A042BA3.1080509@orkash.com> <5A0534EF.4020105@orkash.com> From: Lou DeGenaro Date: Fri, 10 Nov 2017 06:16:38 -0500 Message-ID: Subject: Re: DUCC's job goes into infintie loop To: user@uima.apache.org Content-Type: multipart/alternative; boundary="001a114b151cf4052f055d9f0dee" archived-at: Fri, 10 Nov 2017 11:16:44 -0000 --001a114b151cf4052f055d9f0dee Content-Type: text/plain; charset="UTF-8" Are you running with a shared file system on your cluster? Is your user log directory located there? Look at the DUCC daemon log files located in $DUCC_HOME/logs. They should provide some clues as to what is wrong. Feel free to post (non-confidential versions of) them here for a second opinion. Lou. On Fri, Nov 10, 2017 at 12:11 AM, priyank sharma wrote: > There is nothing on the work item page and performance page on the web > server. There is only one log file for the main node, no log files for > other two nodes. Ducc job processes not able to pick the data from the data > source and no UIMA aggregator is working for that batches. > > Are the issue because of the java heap space? We are giving 4gb ram to the > job-process. > > Attaching the Log file. > > Thanks and Regards > Priyank Sharma > > On Thursday 09 November 2017 04:33 PM, Lou DeGenaro wrote: > >> The first place to look is in your job's logs. Visit the ducc-mon jobs >> page ducchost:42133/jobs.jsp then click on the id of your job. Examine >> the >> logs by clicking on each log file name looking for any revealing >> information. >> >> Feel free to post non-confidential snippets here, or If you'd like to chat >> in real time we can use hipchat. >> >> Lou. >> >> On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma > > >> wrote: >> >> All! >>> >>> I have a problem regarding DUCC cluster in which a job process gets stuck >>> and keeps on processing the same batch again and again due to maximum >>> duration the batch gets reason or extraordinary status >>> *"**CanceledByUser" >>> *and then gets restarted with the same ID's. This usually happens after >>> 15 >>> to 20 days and goes away after restarting the ducc cluster. While going >>> through the data store that is being used by CAS consumer to ingest data, >>> the data regarding this batch does never get ingested. So most probably >>> this data is not being processed. >>> >>> How to check if this data is being processed or not? >>> >>> Are the resources the issue and why it is being processed after >>> restarting >>> the cluster? >>> >>> We have three nodes cluster with 32gb ram, 40gb ram and 28 gb ram. >>> >>> >>> >>> -- >>> Thanks and Regards >>> Priyank Sharma >>> >>> >>> > --001a114b151cf4052f055d9f0dee--